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Abstract 

Importance  sampling  is  a  technique  that  is  commonly  used  to  speed 
up  Monte  Carlo  simulation  of  rare  events.  However,  little  is  known 
regarding  the  design  of  efficient  importance  sampling  algorithms  in  the 
context  of  queueing  networks.  The  standard  approach,  which  simulates 
the  system  using  an  a  priori  fixed  change  of  measure  suggested  by  large 
deviation  analysis,  has  been  shown  to  fail  in  even  the  simplest  network 
setting  (e.g.,  a  two-node  tandem  network). 

Exploiting  connections  between  importance  sampling,  differential 
games,  and  classical  subsolutions  of  the  corresponding  Isaacs  equa¬ 
tion,  we  show  how  to  design  and  analyze  simple  and  efficient  dynamic 
importance  sampling  schemes  for  general  classes  of  networks.  The 
models  used  to  illustrate  the  approach  include  d-node  tandem  Jackson 
networks  and  a  two  node  network  with  feedback,  and  the  rare  events 
studied  are  those  of  large  queueing  backlogs,  including  total  population 
overflow  and  the  overflow  of  individual  buffers. 
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1  Introduction 


For  more  than  two  decades,  there  has  been  a  growing  of  interest  in  fast 
simulation  techniques  for  estimating  probabilities  of  rare  events  in  queueing 
networks.  Among  the  available  techniques  importance  sampling,  a  method 
in  which  the  system  is  simulated  under  a  different  probability  distribution 
(i.e.,  change  of  measure),  has  received  much  attention  [12,  2]. 

The  standard  approach  to  importance  sampling  for  queueing  considers 
an  a  priori  fixed  and  static  change  of  measure  that  is  suggested  by  large 
deviation  analysis.  This  approach  was  shown  to  lead  to  efficient  importance 
sampling  algorithms  for  simulating  large  buildups  of  a  single/multiple  server 
queue  [1,  15].  However,  rather  little  success  has  been  accomplished  in  ex¬ 
tending  this  standard  heuristic  to  networks  of  queues.  In  even  the  simplest 
network  setting,  such  as  a  two-node  tandem  Jackson  network,  the  change 
of  measure  suggested  by  the  standard  heuristic,  which  amounts  to  exchange 
the  arrival  rate  and  the  smallest  service  rate  [14] ,  fails  to  be  asymptotically 
optimal  in  general  [11]  and  can  lead  to  importance  sampling  estimators  with 
infinite  variance  [3] .  This  failure  is  in  fact  due  to  the  discontinuities  of  the 
state  dynamics  on  the  boundaries  of  the  state  space.  Such  discontinuities 
are  not  present  in  the  case  of  a  single  queue. 

The  purpose  of  the  present  paper  is  to  present  a  framework  under  which 
one  can  systematically  build  simple  and  efficient  dynamic  (i.e.,  state-dependent) 
importance  sampling  schemes  for  simulating  rare  events  in  queueing  net¬ 
works.  Our  method  heavily  exploits  a  recently  discovered  connection  be¬ 
tween  importance  sampling  and  deterministic  differential  games  [7,  8]  and 
the  role  of  classical  subsolution  of  the  Isaacs  equation  associated  with  the 
game  [9,  10].  We  demonstrate  that  one  can  construct  classical  subsolutions 
that  lead  to  simple  and  efficient  importance  sampling  schemes.  As  in  [10], 
such  subsolutions  will  be  identified  as  the  mollification  of  the  pointwise  min¬ 
imum  of  affine  functions. 

To  illustrate  the  main  idea,  we  focus  in  much  of  the  paper  on  d-node 
tandem  Jackson  queueing  networks.  The  rare  events  of  interest  are  various 
types  of  buffer  overflows,  including  total  population  overflow  and  individ¬ 
ual  buffer  overflows.  We  also  discuss  extensions  to  more  general  queueing 
networks.  To  the  best  of  our  knowledge,  the  present  paper  is  the  first  to 
provide  a  rigorous  theoretical  framework  in  which  one  can  build  asymptot¬ 
ically  optimal  importance  sampling  algorithms  for  rare  events  in  networks 
of  queues. 

The  paper  is  organized  as  follows.  Section  2  gives  a  brief  review  of  the 
basics  of  importance  sampling.  In  Section  3,  we  study  in  detail  the  classical 
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problem  of  total  population  overflow  in  two-node  tandem  Jackson  networks. 
Restricting  to  the  two-node  case  permits  a  presentation  of  all  the  key  steps 
in  the  subsolution  approach  with  a  minimum  of  notational  inconvenience. 
The  generalization  to  d-node  tandem  Jackson  networks  and  general  buffer 
overflow  problems  will  be  discussed  in  Section  4.  We  address  the  application 
of  the  subsolution  approach  to  a  two-node  Jackson  network  with  feedback  in 
Section  5.  To  ease  exposition,  most  proofs  are  collected  in  the  appendices. 

2  Basics  of  importance  sampling 

The  basic  idea  of  importance  sampling  is  to  use  a  change  of  measure,  that 
is,  the  system  is  simulated  under  a  different  probability  distribution  and 
the  outcomes  are  multiplied  by  appropriate  likelihood  ratios  (i.e.,  Radon- 
Nikodym  derivatives)  to  form  unbiased  estimators. 

We  specialize  to  the  estimation  of  rare  event  probabilities  and  consider 
a  family  of  events  {An}  in  a  probability  space  (0,  T,  P)  such  that 

lim--logP(A„)  =  7 
n  n 

for  some  positive  constant  7.  In  order  to  estimate  P(d.„),  importance  sam¬ 
pling  generates  samples  under  a  probability  measure  Q  such  that  P  is  ab¬ 
solutely  continuous  with  respect  to  Q,  and  forms  an  estimator  by  averaging 
independent  replications  of 

-  1 

where  dP/dQ  is  the  Radon-Nikodym  derivative,  or  the  likelihood  ratio.  Al¬ 
though  Q  may  depend  on  n,  to  simplify  the  notation  this  is  not  made  explicit. 
It  is  easy  to  check  that  this  importance  sampling  estimator  is  unbiased  since 

E%n]=nAn). 

Its  rate  of  convergence  is  determined  by  the  variance  of  Since  the  last 
display  holds,  minimizing  the  variance  is  accomplished  when  one  minimizes 
the  second  moment,  which  can  be  written 

[2nd  moment  of  pn]  =  E'^[pl}\  =  E^[pn\.  (2.1) 

The  smaller  the  second  moment,  the  faster  the  convergence.  However, 
Jensen’s  inequality  implies  that 

1  2 

limsup - log  in'® [p^]  <  limsup - logE'^[pn]  =  27. 

n  ^  n  ^ 
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In  other  words,  the  exponential  decay  rate  of  the  second  moment  can  be 
at  most  twice  that  of  the  probability.  We  say  the  importance  sampling 
estimator  is  asymptotically  optimal  if  the  upper  bound  is  achieved,  i.e.,  if 

liminf - logE^lpn]  >  27. 

n  n 

Sometimes  27  is  referred  to  simply  as  the  “optimal  decay  rate.” 

Remark  2.1.  The  requirement  that  P  be  absolutely  continuous  with  respect 
to  Q  is  more  stringent  than  necessary.  It  is  sufficient  that  P  be  absolutely 
continuous  with  respect  to  Q  on  a  sub-cr-algebra  that  contains  An,  in  which 
case  the  likelihood  ratio  is  defined  as  the  Radon-Nikodym  derivative  of  P  and 
Q  when  they  are  restricted  on  this  sub-cr-algebra.  In  this  paper,  the  changes 
of  measure  will  be  applied  to  a  sequence  of  iid  random  variables  {Y (A:)},  and 
will  be  restricted  to  the  cr-algebra  generated  by  {Y(k)}  up  until  the  time 
either  the  buffer  overflow  happens  or  the  system  returns  to  the  empty  state. 
Note  that  when  considered  on  the  full  cr-algebra  generated  by  {y(A:)},  it  is 
typical  that  P  is  singular  with  respect  to  Q. 

3  Two-node  tandem  Jackson  networks 

To  illustrate  the  main  idea  of  the  game/subsolution  approach  toward  im¬ 
portance  sampling,  we  specialize  to  two-node  Jackson  tandem  queueing  net¬ 
works,  where  the  arrival  process  is  Poisson  with  rate  A  and  the  service  times 
are  distributed  exponentially  with  rates  /ci  and  112 ,  respectively.  The  system 
is  assume  to  be  stable,  that  is,  A  <  min{//i,  //2}. 

A 


Suppose  that  the  two  queues  share  one  buffer  with  capacity  re,  and  that 
we  are  interested  in  the  overflow  probability 

=  P  {network  total  population  reaches  re  before  returning  to  0, 
starting  from  0}  . 

This  overflow  problem  was  among  the  first  to  be  studied  in  the  literature 
on  importance  sampling  for  networks,  and  has  served  as  a  benchmark  since 
then  [14] .  One  reason  for  the  interest  in  this  particular  event  is  that  it  can 
be  used  to  get  bounds  on  various  related  probabilities. 
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Rescaling  the  time  variable  will  have  no  effect  on  pn,  and  so  without 
loss  of  generality  we  assume  X  +  pi  +  fi2  =  1.  Since  exchanging  the  order 
of  service  rates  does  not  affect  this  probability  [17],  we  further  assume  that 
1^2  <  /^i-  Under  these  conditions,  we  have  the  large  deviation  limit  [11] 

hm--logp„  =  log^  =  7-  (3.1) 

n  n  A 

3.1  The  standard  heuristic 

Based  on  a  heuristic  application  of  large  deviation  analysis,  [14]  proposed  a 
state-independent  importance  sampling  algorithm  for  estimating  which 
amounted  to  interchanging  the  arrival  rate  and  the  smallest  service  rate. 
That  is,  under  the  new  measure,  the  system  has  arrival  rate  //2  and  con¬ 
secutive  service  rates  pi  and  A.  Even  though  [14]  offered  no  theoretical 
justification,  numerical  experiments  suggested  good  performance  of  the  cor¬ 
responding  importance  sampling  estimator  for  a  certain  range  of  parameters. 

A  rigorous  analysis  of  this  importance  sampling  algorithm  first  appeared 
in  [11],  in  which  the  authors  showed  that  the  algorithm  is  asymptotically 
optimal  in  certain  subsets  of  the  set  of  all  possible  parameters.  However, 
it  was  also  shown  that  the  asymptotic  optimality  fails  for  some  parameter 
values,  such  as  when  the  two  service  rates  pi  and  p2  are  nearly  equal  and 
the  arrival  rate  A  is  small.  A  recent  paper  [3]  extended  the  results  in  [11]  and 
showed  that  the  importance  sampling  estimator  can  have  infinite  variance 
for  certain  parameters.  Additional  discussion  on  importance  sampling  for 
queueing  networks  can  be  found  in  the  survey  paper  [12]. 

To  the  best  of  our  knowledge,  the  present  paper  is  the  first  to  present 
an  asymptotically  optimal  (or  even  provably  good!)  importance  sampling 
scheme  for  even  the  relatively  simple  class  of  all  stable  two-node  tandem 
Jackson  networks. 

3.2  The  system  dynamics 

The  system  state  can  be  described  by  the  embedded  discrete  time  Markov 
chain  Z  =  {Z{k)  :  fe  =  0, 1,  2, . . .},  which  is  defined  on  a  probability  space 
(H,  T,  P).  The  state  represents  the  queue  lengths  at  the  transition  epochs  of 
the  tandem  network:  Z(k)  =  {Zi{k),  Z2{k))  where  Zi{k)  is  the  length  of  the 
queue  at  node  i  after  the  A:-th  transition.  Obviously,  Z  can  only  take  values 
at  and  pn  equals  the  probability  that  Z\  -|-  Z2  reaches  n  before  returning 
to  0,  given  that  the  system  is  initially  empty. 
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At  times  when  both  queues  are  non-empty,  the  increments  of  the  Markov 
chain  Z  take  values  in  the  space 

V  =  {UO  =  (1,  0),  Vi  =  (-1,  1),  V2  =  (0,  -1)}  , 

with  corresponding  to  an  arrival  and  Vi  to  a  service  at  node  i  for  i  =  1,2. 
On  the  boundary  where  either  queue  is  empty,  the  dynamics  exhibit  different 
behaviors.  Suppose  that  the  queue  at  node  i  (i  =  1,2)  is  empty.  Then  it 
is  impossible  for  the  process  Z  to  have  increment  Vi  since  it  will  lead  to 
negative  queue  size.  One  way  to  describe  this  discontinuity  in  dynamics  is 
to  allow  Z  to  make  fictitious  jumps  of  size  Vi  on  the  boundary,  but  they 
have  to  be  accounted  for  by  “pushing  back”  the  state  along  the  direction  of 
constraints 

di  =  -Vi, 

so  that  the  state  process  Z  stays  non-negative. 

To  summarize,  the  evolution  of  the  Markov  chain  Z  can  be  modeled  by 
equation 

Z{k  +  1)  =  Z{k)  +  7r[Z{k),  Y{k  +  1)],  (3.2) 

where  Y  =  {Y(k)  :  /c  >  1}  is  a  sequence  of  random  variables  taking  values 
in  the  space  V,  and  the  mapping  vr  is  defined  for  every  z  =  {zi,  Z2)  G 
and  y  G  V  as 

r  ,  .  f  0,  Zi  =  0  and  v  =  Vi  for  some  i  =  1,  2 
tt\z,  v\  =  s  .  ■  (3.3) 

[  y,  otherwise 

The  distribution  of  Z  is  completely  determined  by  that  of  the  sequence 
Y  =  {Y{k)}.  Define 

T+(V)  =  {6  =  {6q,  61,  62)  :  0  is  a  probability  measure  on  V 

and  9i  =  9[vi]  >  0  for  every  i  =  0, 1,  2}. 

Under  the  (true)  probability  measure  P,  T  is  a  sequence  of  independent 
identically  distributed  (iid)  random  variables  with  distribution 

0  =  (A,/ri,/i2)  gT+(V). 

3.3  The  dynamic  importance  sampling  algorithms 

The  importance  sampling  schemes  we  consider  use  state-dependent  changes 
of  measure  that  can  be  characterized  by  stochastic  kernels  0”[-|-]  on  V  given 
M^,  i.e,  0”[-|a:]  G  (P+(V)  for  every  x  G  M+. 
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Figure  1:  The  system  dynamics 


To  be  more  precise,  for  a  given  threshold  n,  define  the  scaled  state  process 
X”  =  Z/n,  where  Z  is  defined  as  in  (3.2).  Since  the  definition  of  tt  implies 
7r[nx,  y]  =  7r[x,y]  for  every  x  G  it  is  not  difficult  to  see  that  satisfies 
the  equation 

X^{k  +  1)  =  X^{k)  +  ^Tr[X^{k),Y{k  +  1)],  (3.4) 

with  initial  condition  X”(0)  =  Z(0)/n  =  0.  The  importance  sampling 
generates  {Y(k)}  as  follows.  The  conditional  probability  of  T(/c  +  1)  =  Vi, 
given  {Y{j)  :  j  =  1,  2  . . . ,  fe},  is  just  0”[uj|X”(/c)]  for  each  i  =  0, 1,  2. 

Define  the  hitting  times 

Tn  =  mf{k  >  0  :  X^{k)  +  X^{k)  =  1} 

To  =  mf{k  >  1  :  X^{k)  =  X^{k)  =  0}. 

Let  An  be  the  event  of  interest,  that  is. 

An  =  {X^  +  X2  reaches  1  before  returning  to  0}  =  {T„  <  Tq}. 

The  importance  sampling  estimator  is  just 

V  =Ia  W  _  Q[y{k  +  1)] 

Pn  Qn^Y{k  +  l)\X^{k)y  ^  ’ 
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The  second  moment  of  pn,  thanks  to  (2.1),  equals  E^[pn]-  The  goal  is 
to  choose  a  stochastic  kernel  0”  so  that  this  second  moment  (whence  the 
variance  of  pn)  is  as  small  as  possible.  Another  important  consideration  is 
that  one  would  like  0”  to  be  simple  and  easy  to  implement. 

Remark  3.1.  The  standard  heuristic  importance  sampling  algorithm  simu¬ 
lates  the  system  using  the  state  independent  change  of  measure  under  which 
{Y (k)}  is  iid  with  distribution  0s  =  fJ-i,  A).  This  corresponds  to  the  spe¬ 
cial  choice  of  stochastic  kernel  0”  with  0”[-|a;]  =  0s  for  every  x. 

3.4  Notation  and  terminology 

Before  we  proceed  to  construct  importance  sampling  algorithms,  we  collect 
in  this  section  some  notation  and  terminology.  Define 

D  =  {{xi,X2)  ■■  Xi  >  0,Xl  +  X2  <  1}  , 

D  =  {{xi,X2)  ■■  Xi  >  0,Xl+ X2  <  1}  , 

di  =  {(0,X2)  :  0  <  X2  <  1}, 

02  =  {(xi,0)  :  0  <  Xi  <  1}, 

de  =  {{xi,X2)  ■■  Xi  >0,Xi+ X2  =  1}  , 

Dn  =  D  r\  {{zi,  Z2) / n  ■.  {zi,  Z2)  ^  , 

Dr,  =  D  r\  {{zi,  Z2) / n  ■.  {zi,  Z2)  ^  1?+} . 

Sometimes  we  refer  to  as  the  “exit  boundary.” 


Figure  2:  Domains  and  boundaries 
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3.5  The  Isaacs  equation 


The  main  purpose  of  this  section  is  to  derive  the  Isaacs  equation  associated 
with  the  limit  differential  game  that  lies  underneath  importance  sampling  al¬ 
gorithms.  The  derivation  will  be  kept  formal.  A  rigorous  argument,  though 
possible,  is  not  necessary  for  our  purpose. 

Recall  our  goal  is  to  choose  a  stochastic  kernel  0"  so  as  to  keep  the  second 
moment  E^[p^  as  small  as  possible.  We  can  think  of  this  as  a  stochastic 
control  problem  and  write  down  the  corresponding  Dynamic  Programming 
Equation  (DPE).  To  this  end,  we  extend  the  dynamics  and  let,  for  every 


Vn{x)  =  inf  El[pn\  =  inf  El 

Qn  Qn 


u.-  n 


e[Y{k  +  i)] 


e-[Y{k  +  i)\x-{k)] 


where  pn  is  defined  in  exactly  the  same  fashion  as  in  Section  3.3  and  E^ 
denotes  expected  value  conditioned  on  X”(0)  =  x. 

For  simplicity,  we  further  assume  that  x  G  whence  'K[x,y\  =  y  for 
every  y  G  V.  Under  the  original  probability  measure  P,  the  sequence  {Y {k)} 
is  iid  with  distribution  0.  Hence  the  DPE 


2 

Vn{x)  =  inf  ^ 
06y+(V) 


14, 


1 

X  -I-  —Vi 

n 


©K 


Q[vi 


holds.  Consider  a  logarithmic  transform  of  14  and  define 

Wn{x)  =  --logl4(x). 
n 

We  have 


114  (x)  =  _  sup - log  ^  exp  -nH4 

0ey+(V)  n 


1 

X  H — u, 

n 


-  log 


Qbi] 

©bi] 


©bi]- 


A  key  step  in  the  derivation  is  to  apply  the  relative  entropy  representation 
for  exponential  integrals  to  the  right-hand-side  of  the  last  equation.  For 
completeness,  we  include  the  representation  in  its  general  form  in  Remark 
3.3.  It  follows  that 


114  (x)  =  sup  inf 

0ey+(V)  ®ey+(V) 


^  H4  [  x  -h  -Uj  I  9[vi 
Li=0 


1 


+-  I]^b*]iog 


u=0 


Qbd 

©bi] 


R(bl©) 
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Note  that  taking  infimum  over  9  £  iP'*'(V)  is  equivalent  to  taking  infimum 
over  9  G  iP(V)  since  by  Remark  3.3  the  minimizing  9  is  mutually  absolutely 
continuous  to  0  ,  whence  it  belongs  to  !?■*■(¥). 

Suppose  for  now  that  Wn{x)  converges  to  W{x).  Formally  assume  the 
approximation 


Wn  [x  +  -  Wn{x)  «  -{DW{x),  Vi), 

\  n  J  n 

where  DW  is  the  gradient  of  W.  Observing  ^  9[vi]  =  1,  we  arrive  at 

2 


0  =  sup  inf 
0ey+(V)  seV+iY) 


{DW{x),¥m  +  J]  9[vi]  log  11^  +  R(0||0) 
i=o 


,  (3.6) 


where 

2 

1^(0)  =  (3-7) 

i=0 

for  each  9  G  1P"'“(V).  Equation  (3.6)  is  called  an  Isaacs  equation. 

We  now  discuss  the  boundary  conditions.  For  the  exit  boundary,  we  have 
by  definition  Vn{x)  =  1  or  Wn{x)  =  0,  therefore  we  impose  the  Dirichlet 
boundary  condition 

W{x)  =  0,  for  X  G  9e-  (3.8) 

For  di  and  82,  we  impose  the  Neumann  boundary  condition  that  is  typically 
associated  with  constrained  dynamics  [13] 

{DW{x),di)  =  t),  for  X  €  di-  (3.9) 

Finally,  we  make  a  few  remarks  on  the  game  interpretation  of  importance 
sampling.  The  Isaacs  equation  (3.6)  indicates  that  the  underlying  game 
has  two  players.  The  player  who  chooses  the  change  of  measure  in  order 
to  minimize  the  second  moment  (i.e.,  0)  becomes  the  maximizing  player 
in  the  game  due  to  the  negative  sign  in  the  logarithmic  transform.  The 
minimizing  player  is  artificially  introduced,  and  chooses  9.  We  will  refer 
to  this  player  as  the  “large  deviation  player.”  The  dynamics  of  the  game 
are  completely  determined  by  9,  or  the  choice  of  the  large  deviation  player, 
while  the  running  cost  of  the  game  depends  on  the  choices  of  both  players. 

Remark  3.2.  The  original  dynamics  have  initial  condition  x  =  0,  and  W{0) 
characterizes  the  asymptotic  exponential  decay  rate  of  the  second  moment. 
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Remark  3.3.  Relative  Entropy  Representation  for  Exponential  Integrals. 
Let  (5,  J")  be  a  measurable  space  and  /  :  S'  ^  M  a  bounded  measurable 
function.  Denote  by  1P(S)  the  space  of  probability  measures  on  (S,  S').  Then 
for  any  7  G  T(S), 


inf 

0ey(S) 


Furthermore,  the  minimizer  of  the  right-hand-side  exists  and  is  mutually 
absolutely  continuous  with  respect  to  7.  Here  the  relative  entropy  R(-||-)  is 
defined  as 


[  CXD 


if  0  <C  7 
otherwise 


We  refer  the  readers  to  [4,  Proposition  1.4.2]  for  the  proof. 


3.6  The  properties  of  the  Hamiltonian 

Our  construction  of  importance  sampling  algorithms  is  based  on  classical 
subsolutions  to  the  Isaacs  equation.  Therefore,  it  is  useful  to  study  the 
properties  of  this  equation.  To  this  end,  define  for  each  p  G 


Hip)  =  sup  inf 
0ey+(V)  ®ey+(V) 


2  ~ 

(p,  F(0))  +  ^  0N  log  ^  +  R(0||0) 


i=0 


(3.10) 


The  function  H  is  called  the  Hamiltonian,  and  the  Isaacs  equation  (3.6)  can 
be  written  as 

m{DW)=0.  (3.11) 

We  have  the  following  result,  whose  proof  is  deferred  to  Appendix  C. 

Proposition  3.4.  LetM  be  defined  as  in  (3.10). 

1.  Eor  eaehp  =  (pi,P2)  £  there  exists  a  saddle  point  (0*(p),  0*{p))  G 
!?■''(¥)  X  T+(V)  given  by 


0*(p)  =  e*{p)  =  N{p)  , 


where 


N{p)  = 


-1 


In  partieular,  the  order  of  sup  and  inf  ean  be  exehanged  in  (3.10). 
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2.  We  have  the  representation 


H(p)  =  [(p,F(0))  +  2i?(0||0)]  =  21ogiV(p). 

In  partieular,  H  is  eoneave. 

Figure  3  is  a  picture  of  the  zero-level  curve  of  H.  Recall  that  7,  as  defined 
in  (3.1),  equals  log(/r2/A). 


n  =  27(-1,-1) 

r2  =  27(-l,0) 

r3  =  (0,0) 

f2  =  21og(pii/A)(-l,0) 


Figure  3:  Hamiltonian  H 


Remark  3.5.  For  any  p  G  we  will  refer  to  Q*{p)  as  the  (saddle  point) 
change  of  measure  corresponding  to  p. 

3.7  The  solution  to  the  Isaacs  equation 

In  [7,  8],  the  saddle  point  strategy  generated  by  the  solution  to  the  Isaacs 
equation  was  used  to  construct  efficient  importance  sampling  schemes.  How¬ 
ever,  viscosity  solutions  to  the  Isaacs  equation  (3.11)  and  boundary  condi¬ 
tions  (3.8),  (3.9),  which  are  only  weak-sense  solutions,  are  not  suitable  for 
the  purpose  of  constructing  efficient  importance  sampling  algorithms  for  this 
tandem  Jackson  network. 

More  precisely,  consider  the  very  simple,  affine  function 

Ws{x)  =  {ri,x)  +  27. 

This  function  is  a  viscosity  solution  to  the  Isaacs  equation  (3.11)  and  bound¬ 
ary  conditions  (3.8),  (3.9).  It  is  in  fact  the  maximal  viscosity  solution,  and 
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is  the  “physically  significant”  solution,  in  the  sense  that  Ws{x)/2  equals  the 
asymptotic  decay  rate  of  pn  when  X(0)  =  x.  Even  though  VEs(O)  =  27,  the 
optimal  decay  rate,  the  saddle  point  strategy  corresponding  to  Ws  does  not 
lead  to  efficient  importance  sampling  algorithms.  Indeed,  thanks  to  Propo¬ 
sition  3.4  and  straightforward  calculation,  the  0-component  in  the  saddle 
point  is 

Q*{DWs)  =  e*{ri)  =  {p2,liu\), 

which  is  exactly  the  state-independent  change  of  measure  0s  based  on  stan¬ 
dard  heuristic;  see  Section  3.1. 

As  remarked  previously,  the  failure  of  the  importance  sampling  based  on 
Ws  is  due  to  the  fact  that  Ws  is  only  a  weak-sense  viscosity  solution.  It  is 
not  a  classical  solution  (or  even  a  classical  subsolution  as  defined  in  the  next 
subsection) ,  since  on  the  boundary  82 

{DWs,  d2)  =  (ri,  d2)  =  -27  <  0. 

In  a  sense  that  we  will  make  precise  later  on,  this  inequality  is  in  the  “wrong” 
direction,  which  suggests  that  the  (artificial)  large  deviation  player,  who 
determines  the  game  dynamics,  may  be  able  to  exploit  this  “bad”  boundary 
to  a  degree  that  the  importance  sampling  estimator  based  on  Ws  becomes 
inefficient.  It  is  not  coincidental,  as  observed  in  [11],  that  the  inefficiency  of 
0s  in  general  is  because  a  sample  path  can  spend  a  significant  amount  of 
time  near  boundary  82  before  leaving  domain  D  and  thereby  accumulate  a 
huge  Radon-Nikodym  derivative. 

Remark  3.6.  This  example  shows  even  when  there  is  an  ostensibly  smooth 
viscosity  solution  to  the  Isaacs  equation,  it  may  not  always  lead  to  efficient 
importance  sampling  algorithms. 

3.8  Subsolutions  and  importance  sampling  schemes 

The  idea  of  [9,  10]  is  that  classical  subsolutions  to  Isaacs  equations  can  be 
used  to  construct  efficient  importance  sampling  schemes.  It  has  advantages 
over  solution-based  importance  sampling  schemes  in  simplicity,  greater  flex¬ 
ibility,  and  general  applicability.  The  goal  of  this  section  is  to  construct 
classical  subsolutions  and  identify  the  corresponding  changes  of  measure. 
The  analysis  of  the  asymptotic  behaviors  of  the  importance  sampling  esti¬ 
mator  will  be  carried  out  in  Appendix  B. 

Definition  3.7.  A  function  W  :  D  — >  M  is  said  to  be  a  classical  subsolution 
to  the  Isaaes  equation  (3.11)  and  boundary  conditions  (3.8),  (3.9)  if 
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1.  W  is  continuously  differentiable, 

2.  ]HI(L)Vl^(x))  >  0  for  every  x  £  D, 

3.  W{x)  <0  for  X  e  5e, 

4-  {DW{x),  di)  >  0  for  X  €  di,  i  =  I,  2. 

As  in  [9,  10],  the  construction  of  classical  subsolutions  are  divided  into 
two  steps.  We  first  identify  a  subsolution  as  the  minimum  of  affine  functions 
and  then  mollify  it  to  obtain  a  classical  subsolution. 

3.8.1  Construction  of  piecewise  afRne  subsolutions 

As  we  will  see,  what  is  needed  is  a  piecewise  affine  subsolution  W  with  the 
following  properties. 

1.  The  function  W  can  be  written  as  IT  =  Wi  A  W2  A  W3  where  Wk  is 
an  affine  function  for  each  k  =  1,  2,  3. 

2.  D  is  divided  into  three  regions  i?i,  R2,  and  R3,  such  that  in  each 
region  Rk,  W  =  Wk. 

3.  The  subsolution  property  H{DW{x))  =  H{DWk{x))  >  0  holds  for 
every  x  in  the  interior  of  region  R^- 

4.  The  Dirichlet  boundary  inequality  W {x)  <  0  for  x  £  d^- 

5.  The  Neumann  boundary  inequality  {DW{x),  df  >  0,  whenever  x  £  di 
and  DW{x)  exists. 

This  can  be  easily  achieved  -  indeed,  fixing  an  arbitrary  5  >  0,  one  can 
let,  for  each  k, 

wi (x)  =  {rk,  x)  +  27  -  kd,  (3.12) 

where  the  are  depicted  in  Figure  3.  It  is  not  difficult  to  check  that 

=  Wf  A  wi  A  wi 
satisfies  all  the  requirements  for  all  small  <5  >  0. 

Reruark  3.8.  As  will  be  discussed  further  in  Remark  3.15,  the  value  of  the 
subsolution  W^  at  x  =  0  is  important  and  we  would  like  it  to  be  as  close  to 
the  optimal  decay  rate  as  possible.  But 

iy‘^(0)  =  27  —  35  =  “optimal  decay  rate”  —  3(5. 

Therefore,  5  will  be  taken  as  a  small  positive  number. 
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5 

d2 


Figure  4:  Piecewise  afSne  subsolution 

Remark  3.9.  Asymptotically  optimal  schemes  can  be  found  by  letting  e 
and  6  depend  on  re.  This  is  discussed  in  Subsection  3.8.5. 

Remark  3.10.  The  failure  of  the  boundary  inequality  along  the  xi  axis, 
which  corresponds  to  the  existence  of  a  boundary  layer  in  the  prelimit  which 
vanishes  in  the  limit,  requires  the  introduction  of  W2,  which  perturbs  the 
gradient  in  a  neighborhood  of  this  axis.  A  similar  perturbation  is  not  re¬ 
quired  along  the  X2  axis,  since  the  boundary  inequality  already  holds  there. 
ITg  is  introduced  to  ensure  that  both  boundary  conditions  hold  in  a  neigh¬ 
borhood  of  the  origin. 

Remark  3.11.  There  are  many  different  choices  in  the  construction  of  piece- 
wise  affine  subsolutions.  For  example,  if  one  replaces  r2  by  f2  (see  Figure  3) 
in  the  dehnition  of  IT^  >  then  the  resulting  function  will  also  have  the  desired 
properties.  This  flexibility  can  be  exploited  to  simplify  the  construction  of 
schemes,  though  as  one  might  expect  the  particular  choices  will  have  some 
effect  on  the  performance,  and  indeed  arguments  can  be  made  to  support 
particular  choices  on  this  basis.  However,  a  discussion  on  this  issue  is  too 
detailed  for  the  present  paper,  and  we  refer  to  the  reader  to  [16]  for  more 
information. 

3.8.2  Mollification 

There  are  different  ways  to  mollify  the  piecewise  affine  subsolution  .  We 
will  adopt  a  mollification  called  exponential  weighting  that  is  specialized  here 
to  the  minimum  of  a  finite  set  of  smooth  functions.  For  future  reference,  we 
describe  the  mollification  in  its  general  form. 
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Consider  a  finite  collection  of  continuously  differentiable  functions  {/ii,  /i2, . . . ,  hx} 
and  let 

h  =  hi  A  h2  /\  ■■■  /\  hx  ■ 

Fix  a  small  positive  number  e  and  define 

h^[x)  =  -e log ^ exp  I --hk{x)  I  . 

fc=i  ^  J 


We  have  the  following  result,  whose  proof  is  straightforward  and  can  be 
found  in  [9,  Section  3.3]. 

Lemma  3.12.  For  any  e  >  0,  is  continuously  differentiable  with 


K 

Dhffx)  =  ^pl{x)Dhk{x), 

k=l 


where 

^  exp{-h^(a;)/e} 

EfcLiexp{-/ifc(x)/e} 
Furthermore,  we  have  the  uniform  bounds 

—Ke  <  h^{x)  —  h{x)  <  0 


for  every  x. 

Note  that  p‘^{x)  =  (pffx) ,  p^ix) , . .  .,pf^{x))  defines  a  probability  vector  in 
the  sense  that  p%{x)  >  0  and 


K 

=  1. 

k=l 

Remark  3.13.  A  well-known  general  mollification  method,  as  in  the  clas¬ 
sical  PDF  literature,  is  to  integrate  h  against  a  smooth  convolution  kernel. 
However,  we  do  not  recommend  such  an  approach  because  it  involves  nu¬ 
merical  integrations,  which  can  be  computationally  demanding,  especially 
in  high  dimensions.  In  contrast,  the  computations  involved  in  exponential 
weighting  are  simple  and  easy  to  implement. 
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3.8.3  The  classical  subsolution 

Applying  this  mollification  to  ,  we  define 


k=l 


(3.13) 


Thanks  to  Lemma  3.12,  is  continuously  differentiable  with 


3 


DW^’\x)  = 


(3.14) 


k=l 


where 


exp  [-W-{x)/e] 


(3.15) 


is  not  pre¬ 


cisely  a  classical  subsolution,  but  only  approximately.  Indeed,  Lemma  B.l 
states  that  the  Neumann  boundary  conditions  (HVL^’*^,  dj)  >  0  are  not  sat¬ 
isfied  for  X  €  di-  However,  the  lemma  also  indicates  that  they  are  “approx¬ 
imately”  satisfied  in  the  sense  that,  for  x  G  di, 


{DW^’\x),di)  >  -e 


for  some  small  positive  number  e  as  long  as  e/6  is  chosen  small.  The  reason 
for  this  violation  of  the  subsolution  property  is  that  the  exponential  weight¬ 
ing  is  not  a  “local”  smoothing.  It  can  be  avoided  if  one  uses  integration 
against  a  convolution  kernel  with  small  support,  but  the  advantages  of  the 
exponential  weighting  outweigh  the  minor  additional  complications  in  the 
analysis  introduced  by  this  error. 

3.8.4  The  importance  sampling  estimator  and  its  asymptotics 

For  each  k,  let  0^  is  the  saddle  point  change  of  measure  that  corresponds 
to  the  affine  function  Wk,  or  equivalently, 

0^  =  0*(Wfc)  =  0*(rfc)  eiP+(V), 

where  0*(-)  is  as  defined  in  Proposition  3.4.  Straightforward  calculation 


yields  that 


01  —  fJ-i,  A),  0 
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The  change  of  measure  based  on  the  is  just  a  state-dependent  mix¬ 
ture  of  0^.  More  precisely,  define  a  stochastic  kernel  by 

0^’^[-|x]  =  Y,pf{x)Ql  E  1P+(V),  (3.16) 

k=l 

and  for  each  fixed  n,  let 

0n[.|.]  =  0e,5[.|.]_  (3J7) 

In  other  words,  the  importance  sampling  algorithm  simulates  y(A:  -|-  1),  con¬ 
ditional  on  the  sample  history  {Y{j)  ■  1  <  j  <  k},  from  the  distribution 
0^’‘^[-|X"(/c)],  where  X”  is  the  state  process  as  defined  in  (3.4).  The  impor¬ 
tance  sampling  estimator  pn  is  then  given  by  (3.5). 

We  have  the  following  result  regarding  its  asymptotic  performance,  whose 
proof  is  deferred  to  Appendix  B. 

Theorem  3.14.  There  exist  a  pair  of  positive  eonstants  {A,  B)  that  only 
depend  on  the  system  parameters  (A,/ri,//2)  sueh  that,  provided  e/5  <  B, 
the  seeond  moment  of  the  importanee  sampling  estimator  pn  satisfies 

liminf - log[2nd  moment  of  pn]  >  27  —  F{e,  6), 

n  n 

where 

F{e,  (5)  =  3e  -|-  3(5  -|-  Aexp{— (5/e}. 

Since  27  is  the  optimal  decay  rate  for  the  second  moment,  the  theorem 
suggest  that  the  importance  sampling  algorithm  is  nearly  asymptotically 
optimal  as  long  as  F{e,  6)  is  made  small.  This  can  be  achieved  if  one  chooses 
(5  small,  and  e  small  compared  to  5. 

Remark  3.15.  The  proof  of  Theorem  3.14  indeed  shows  that  the  asymptotic 
decay  rate  of  the  second  moment  of  pn  is  bounded  from  below  by 

VT^’‘^(0)  —  Aexp{—5/e}. 

The  presence  of  the  term  Aexp{— (5/e}  is  due  to  the  fact  that  is  only 
“approximately”  a  classical  subsolution;  see  Section  3.8.3.  It  says  that  the 
performance  of  an  importance  sampling  scheme  based  on  a  subsolution  is 
largely  characterized  by  the  value  of  the  subsolution  at  x  =  0.  Similar  results 
were  obtained  in  [9,  10]. 
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Remark  3.16.  The  formula  of  F  also  provides  an  interesting  relation  be¬ 
tween  £  and  5.  For  each  fixed  small  e,  it  is  not  difficult  to  check  that  F(e,  •) 
is  minimized  at 

6  =  —£  log  £  +  £  log  ~  — e  log  e. 

O 

This  suggests  that  a  good  strategy  is  to  set  6  =  — eloge.  Note  that  in  this 
case,  when  e  is  small,  so  is  6,  even  though  6  is  comparatively  much  larger. 

3.8.5  Asymptotic  optimality 

The  previous  section  provides  a  nearly  asymptotically  optimal  importance 
sampling  algorithm.  It  is  good  enough  for  many  practical  purposes  where 
n  is  large  but  not  exceedingly  large.  However,  one  would  still  like  to  see  an 
algorithm  that  gives  optimality.  This  only  requires  a  slight  modihcation. 

Instead  of  using  a  fixed  pair  of  parameters  e  and  6  for  all  n,  we  now  allow 
them  to  vary  depending  on  n  and  denote  them  by  £„  and  6n-  For  each  re, 
we  use  the  change  of  measure  based  on  That  is,  for  each  re,  define 

the  stochastic  kernel 

3 

^  G  T+(V),  (3.18) 

k=l 

and  let 

©"[•I']  =  (3.19) 

Abusing  the  notation  a  bit,  we  again  denote  by  pn  the  corresponding  im¬ 
portance  sampling  estimator. 

Theorem  3.17.  The  importance  sampling  estimator  pn  is  asymptotically 
optimal,  that  is 

lim - log  [2nd  moment  of  pn]  =  27, 

n  re 

provided  that  6n  — >  0,  £nldn  — >  0,  and  ree^  ^  00. 

Remark  3.16  suggests  that  a  good  choice  is  to  set  6n  =  —£n^og£n-  In 
this  case,  asymptotic  optimality  follows  if  — >  0  and  ree„  ^  00. 

3.8.6  Further  remarks  on  the  importance  sampling  algorithms 

The  computation  of  the  weights  or  is  very  simple.  As  a 

consequence,  the  dynamic  importance  sampling  algorithms  based  on  (3.16)- 
(3.17)  or  (3.18)-(3.19)  are  practically  as  fast  as  the  standard  heuristic  where 
a  constant  change  of  measure  is  used. 
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It  is  possible  that  one  can  associate  other  changes  of  measure  with  sub¬ 
solutions.  For  example,  one  can  define  0"[-|x]  =  &* {DW^’^ {x))  in  lieu  of 
(3.16)-(3.17),  and  the  resulting  algorithms  will  have  similar  asymptotic  per¬ 
formance.  However,  the  use  of  mixtures  such  as  (3.16)  is  computationally 
more  convenient.  This  is  especially  the  case  when  the  change  of  measure 
is  not  easily  obtainable.  For  example,  for  a  system  with  Markov  modu¬ 
lated  arrival  and  service  rates,  the  computation  of  the  change  of  measure 
appropriate  to  a  single  gradient  p  requires  solving  an  eigenvalue/eigenvector 
problems.  If  we  smooth  first  and  then  compute  the  change  of  measure  suit¬ 
able  for  each  point  x,  then  many  such  problems  must  be  solved.  In  contrast, 
mixtures  like  (3.16)  only  require  the  computation  of  the  changes  of  measure 
that  correspond  to  the  finite  collection  of  vectors  r^. 

3.9  Numerical  results 

In  this  section  we  present  some  numerical  results  in  the  case  where  A  =  0.1, 
fJ-i  =  IJ-2  =  0.45.  The  importance  sampling  algorithm  based  on  the  standard 
heuristic  [14] ,  which  amounts  to  exchanging  the  arrival  rate  and  the  smallest 
service  rate,  leads  to  estimators  with  infinite  variance  [3].  For  comparison, 
the  theoretical  value  of  pn  is  obtained  by  iteratively  solving  the  linear  system 
of  equations  that  characterize  this  probability,  an  approach  that  is  feasible 
when  the  system  is  sufficiently  small. 

In  the  simulations,  we  always  set  6  =  —e  loge.  This  choice  was  suggested 
by  Remark  3.16,  and  was  experimentally  observed  to  be  a  good  choice  for 
small  e.  We  ran  simulations  for  n  =  20,  with  e  =  0.01,0.02,  and  0.03, 
respectively.  For  each  e  we  present  two  estimates  and  each  estimate  consists 
of  20,000  replications.  The  theoretical  is  pn  =  6.0  x  10“^^. 

In  all  the  tables,  “Std.  Err.”  stands  for  “Standard  Error”  and  “C.I.”  for 
“Confidence  Interval”.  The  performance  of  the  dynamic  importance  sam¬ 
pling  schemes  based  on  subsolutions  is  stable  across  different  simulations, 
with  estimates  that  are  close  to  the  theoretical  value  and  having  small  stan¬ 
dard  errors. 


£  =  0.01 

e  =  0.02 

e  =  0.03 

No.l 

No.  2 

No.l 

No. 2 

No.l 

No.  2 

Estimate  (x  10 

5.7 

5.5 

5.8 

6.1 

6.3 

5.8 

std.  Err.  (xlO"^^) 

0.4 

0.3 

0.3 

0.5 

0.4 

0.2 

95%  C.I.  (xlO-i^) 

[4.9,  6.4] 

[4.9,6. 1] 

[5. 2,6. 4] 

[5.1,7.!] 

[5.5,  7.1] 

[5. 3,6. 3] 

Table  1.  IS  based  on  subsolutions,  two- node  tandem,  total  population  overflow. 


Below  are  more  simulation  results  with  n  =  30,  40,  50,  with  e  =  0.02  and 
6  =  — eloge.  Each  estimate  consists  of  20,000  replications. 
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n  =  30 

n  =  40 

n  =  50 

Theoretical  value 

2.63  X  10-^« 

1.03  X  10-^« 

3.80  X  10-^^ 

Estimate 

2.73  X  10“^“ 

1.05  X  10“^"^ 

3.75  X  10-^^ 

Std.  Err. 

0.18  X  10"^“ 

0.03  X  10-^"‘ 

0.16  X  10“^^ 

95%  C.I. 

[2.37,3.09]  X  10-^« 

[0.99,1.11]  X  10-^"^ 

[3.43,4.07]  X  10“^^ 

Table  2.  IS  based  on  subsolutions,  two-node  tandem,  total  population  overflow. 


Remark  3.18.  It  is  not  difficult  to  check  that  the  “thickness”  or  the  height 
of  the  boundary  region  R2  (see  Figure  4)  is  <5/(27).  Since  in  Figure  4  we  are 
scaling  the  queue  sizes  by  a  factor  n,  the  thickness  of  the  boundary  region 
in  the  prelimit  will  be  n5/{2^)  when  unsealed.  However,  the  optimality 
conditions  ne^  — >  00  and  e„/<5n  — >  0  in  Theorem  3.17  imply  that  n6n  — >  00. 
This  does  not  allow  the  boundary  region  to  be  too  thin  in  prelimit.  The 
need  for  such  control  is  supported  by  experimentation,  which  shows  that  for 
a  fixed  n,  the  simulation  results  tend  to  deterioriate  when  e  is  too  small. 

4  Extensions  to  d-node  tandem  Jackson  networks 

The  work  on  the  two-node  tandem  Jackson  network  can  be  easily  extended 
to  <i-node  tandem  Jackson  networks  and  more  general  exit  probabilities.  To 
be  more  precise,  consider  a  <i-node  tandem  Jackson  network  with  Poisson 
arrival  rate  A  and  consecutive  exponential  service  rates  ni, ,  Hd.  The  state 
of  the  network  is  described  by  the  embedded  Markov  chain  Z  =  {Z(k)}  = 
{{Zi{k), . . . ,  Zd{k))},  where  Zi  denotes  the  queue  length  at  node  i.  The 
system  is  assumed  to  be  stable,  that  is,  A  <  min{//i, . .  .,/id}.  Let  F  be  a 
closed  subset  of  such  that  0  0  F  and  the  closure  of  \  F  is  compact. 
We  are  interested  in  the  following  rare-event  probability 

Pn  =  PjProcess  Z  hits  set  reF  before  returning  to  0,  starting  from  0}. 

Without  loss  of  generality,  we  assume  that  X  +  +  ■  ■  ■  +  pd  =  ^-  We  also 

assume  that  pn  decays  exponentially  with 

lim--logp„  =  7. 
n  n 

4.1  Isaacs  equation  and  the  Hamiltonian 

In  order  to  write  down  the  Isaacs  equation  associated  with  this  problem, 
we  introduce  the  following  notation.  The  increments  of  Z  take  values  in 
V  =  {uo,  vi, ... ,  Vd}  where  the  Uj’s  are  d-dimensional  vectors  defined  by 

(  -1,  if  /  =  * 

uo  =  (1,  0, . . .,  0),  =  <  1,  if  j  =  i  +  I  and  j  <  d  . 

[  0  ,  otherwise 
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vq  corresponds  to  an  arrival  and  Vi  to  a  service  at  node  i.  Similar  to  (3.4), 
the  scaled  state  process  X'^  =  Zjn  satisfies 

X^{k  +  1)  =  X^{k)  +  -7r[X^{k),Y{k  +  1)], 
n 

where  vr  plays  the  same  role  as  in  (3.3).  The  sequence  {Y(k)}  consists  of  iid 
random  variables  taking  values  in  V  with  common  distribution 

0  =  (A,^i,...,/irf)  gT+(V). 

Define  the  regions 

D  =  {x  G  :  X  0  r,  Xj  >  0,  z  =  1, . . . ,  d}, 

di  =  {x  G  :  X  0  r,  Xj  =  0},  z  =  1, . . . ,  d, 

and  the  directions  of  constraints 


di  =  -Vi. 


The  Isaacs  equation  is  just  Il{DW)  =  0,  where 


H(p) 


sup  inf 
0eCP+(v)  ^'ey+iv) 


{p,¥{e))  +  '^e[vi]  log^l^ 

i=o 


i?(d||0) 


^(0)  =  '^0[vi]  -  Vi 
i=0 

for  every  9  G  1P'’'(V).  The  boundary  conditions  are  IT(x)  =  0  for  x  G  T  and 
{DW{x),  di)  =  0  for  X  G  di. 

The  following  result  is  an  extension  of  Proposition  3.4,  whose  proof  is 
very  similar  and  thus  omitted. 

Proposition  4.1.  For  every  p  G  there  exists  a  saddle  point  for  the 
Hamiltonian  H,  say  {&* (p) ,  9* (p))  G  1P’'‘(V)  x  1P+(V),  given  by 


0*(p)N  =  9*{p)[vi\  =  N{p)  ■  0[z;i]  exp{-(p,z;j)/2}. 


where 


N{p)  = 


^0[z;j]  exp{-(p,z;j)/2} 
j=o 


Moreover,  the  Hamiltonian  H  is  coneave  and  H(p)  =  2  log  N  (p) . 
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4.2  Subsolutions  and  importance  sampling  schemes 

The  construction  of  subsolution  also  proceeds  in  a  similar  fashion:  we  start 
with  a  piecewise  smooth  “subsolution”  and  then  mollify  it  by  exponential 
weighting.  We  will  discuss  the  general  case  where  the  subsolutions  can  vary 
depending  on  n.  To  be  more  specific,  let  (W^, . . . ,  W^)  be  smooth  functions 
(preferably  affine  functions)  and  let 

W”  =  Wf  A  •  •  •  A  W'^. 


The  choice  of  {WJ}}  should  have  the  following  properties: 

1.  Il{DW^{x))  >  0  for  every  x  &  D  and  every  k, 

2.  W'^{x)  <  0  for  every  a:  G  T, 

3.  for  X  on  boundary  dj,  {DW^{x),  di)  >  0  when  DW^{x)  is  well  defined. 


Fix  >  0.  The  exponential  weighting  produces  a  smooth  mollification 
of  IT"  by 


Similar  to  the  proof  of  Lemma  B.l,  it  is  not  difficult  to  show  that,  thanks 
to  the  concavity  of  H  and  Lemma  3.12,  satisfies  H(L)lT^’"(a:))  >  0 

for  X  G  H  and  iy^’”(x)  <  0  for  every  x  G  T.  However,  (Z)VF^"’"'(x),  di)  >  0 
may  not  hold  for  x  G  dj.  But  we  should  expect  these  boundary  inequalities 
to  be  true  at  least  approximately,  thanks  to  the  third  property  of  {WJ}}. 

For  each  n,  the  importance  sampling  change  of  measure  based  on 
is  as  follows.  Let 


exp  [-W^{x)/en} 
Ef=iexp{-W^"(x)/en}’ 


and 

0*’"(x)  =  0*(ZHTfc-(x))GT+(V) 

where  0*(-)  is  as  defined  in  Proposition  4.1.  The  importance  sampling 
change  of  measure  is  determined  by  the  stochastic  kernel 


e”[.|.|  ^  e  3>+(v). 

k=l 
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That  is,  the  conditional  distribution  of  T(A;  +  1),  given  the  sample  history 
{F(l), . .  .,Y{k)},  is  just  0”[-|X”(A:)].  We  denote  by  pn  the  corresponding 
importance  sampling  estimator. 

The  following  result  is  an  extension  of  Theorem  3.17.  The  proof  is  very 
similar  and  thus  omitted. 

Theorem  4.2.  We  assume  that  {W^{x)}  has  uniformly  bounded  first  and 
second  derivatives  for  x  &  D  and  that  there  exists  Sn  >  0  such  that  for 
X  E  di,  {DW^'^''^{x),di)  >  —En-  We  also  assume  that  liminf„  iy"(0)  >  27. 
Then  the  importance  sampling  estimator  pn  is  asymptotically  optimal,  i.e., 

lim - log  [2nd  moment  of  pfi  =  27, 

n  n 

provided  that  En  ^  0,  — >  0,  and  uEn  00. 

In  the  previous  two-node  tandem  network  we  have  taken  Wfi{x)  = 
(rfc,  x)  -|-  27  —  k6n-  For  this  choice  we  can  set  En  =  27exp{— thanks 
to  Lemma  B.l,  and  Theorem  4.2  reduces  to  Theorem  3.17. 

Remark  4.3.  One  can  also  write  down  a  result  similar  to  Theorem  3.14 
for  the  case  where  WJf  =  Wk  and  En  ^  e,  En  ^  e.  The  corresponding 
importance  sampling  estimator,  still  denoted  by  pn,  will  satisfy 

lim  inf  — —  log[2nd  moment  of  Pn]  >  IT(0)  —  {Ke  -|-  Ce) 
n  n 

where  C  is  a  constant  only  depends  on  the  system  parameter  0,  under  the 
condition  that  e  is  small  enough. 

4.3  Example  and  numerical  results 

In  this  section  we  study  two  examples:  the  individual  buffer  overflow  for 
two-node  tandem  Jackson  network  and  total  population  overflow  for  d-node 
tandem  Jackson  network. 

4.3.1  Two-node  tandem  networks  with  individual  buffer  overflow 

In  this  section  we  consider  the  two-node  tandem  queueing  (d  =  2)  networks 
with  0  =  {X,  pi,  P2),  and  the  quantity  of  interest  is 

Pn  =  {size  of  queue  1  exceeds  Bin  or  size  of  queue  2  exceeds  B2n 

before  the  system  returns  to  empty  state,  starting  from  0}. 
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One  can  think  of  Bin  as  the  individual  buffer  size  for  node  i.  In  the  notation 
we  just  introduced,  it  amounts  to  F  =  {x  G  '■  xi  >  Bi  or  X2  >  -62}- 
Assuming  A  +  //i  +//2  =  1  and  A  <  min{//i,  112},  we  have  (following  a  similar 
argument  in  [11]) 

7  =  lim - logpn  =  lain  Bi  log 

n  n  i=l,2  A 

We  consider  piecewise  affine  subsolutions  that  take  the  form  W”  =  W”  A 
W2  A  where 

^kix)  =  (^fc,  x)  +  27  -  k6n, 

for  some  small  positive  constants  6n-  The  choice  of  {r^}  and  its  correspond¬ 
ing  change  of  measure  {0*(7’fc)}  are  given  in  the  table  below. 


92 


92 


Figure  5:  The  choice  of  {rj,} 


It  is  not  difficult  to  check  that  H{DW^{x))  =  H{rk)  =  0  and  the 
function  W'^  equals  Wk  in  region  (see  the  figure  below).  Furthermore, 
iy”(x)  <  0  for  every  x  G  F  and  {DW'^ (x) ,  di)  >  0  whenever  x  G  9*  and 
DW^  is  well  defined.  As  in  Lemma  B.l,  it  is  also  simple  to  show  that  the 
exponential  weighting  mollification  satisfies 

(iAVF^"’”(x),di)  >  -en  =  -21og[(/ii  V /i2)/A]  exp{-(I„/e„}  (4.1) 
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for  X  £  di-  Thanks  to  Theorem  4.2,  the  importance  sampling  estimator  is 
asymptotically  optimal  if  dn  —>■  0,  Enj^n  — >  0,  and  ne„  ^  oo. 


Ml  >  M2 


Ml  <  M2 


Figure  6:  Piecewise  affine  function 


4.3.2  d-node  tandem  networks  with  total  population  overflow 

In  this  section  we  consider  the  total  population  overflow  for  a  d-node  tandem 
Jackson  network  with  d  >  2,  that  is,  T  =  {x  £  MJ"  :  xi  +  X2  +  ■  ■  ■  +  Xd> 
and 


=  P  {network  total  population  reaches  n  before  returning  to  0, 
starting  from  0}  . 

Specializing  to  the  case  d  =  2  (and  assuming  /ii  >  H2),  the  results  stated 
in  this  section  coincide  with  those  of  Section  3.  Let  /2  =  /ri  A  //2  •  •  •  A  fid- 
Assuming  \  <  Jl  and  A  +  /ri  +  •  •  •  +  //^  =  1,  we  have  [11] 

7  =  lim  --  logpn  =  log  Y- 
n  n  A 

Note  that  unlike  Section  3,  there  is  no  assumption  here  that  the  service 
rates  be  ordered  in  any  way.  That  assumption  was  used  in  Section  3  only 
to  simplify  the  presentation. 

For  any  fixed  n,  we  consider  piecewise  affine  subsolutions  of  form  IT”  = 
Wi  A  •  •  •  A  W^j^i  where 

(^)  =  (^fc,  x)  +  27  -  k5n 


for  some  small  positive  constant  and 


f  —27,  \il<i<d  +  l  —  k 

\  0  ,  otherwise 
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for  1  <k  <  d  and  r^+i  =  0.  The  change  of  measure  corresponding  to  is 


0*(rfc)  =  1  -  {fid+i-k  -  IJ-)^ 

I  k- 


for  1  <  fe  <  d,  and 


0*(rrf+i)  =  0  =  (A,  //i, . . . ,  lid)- 

We  have  the  following  lemma,  whose  proof  is  deferred  to  Appendix  D. 
Lemma  4.4.  The  following  properties  hold: 

1.  ]HI(rfc)  >  0  for  every  k, 

2.  W'^{x)  <  0  for  all  x  G  T, 

3.  if  X  G  di  is  such  that  DW^{x)  is  well  defined  then  {DW^{x),  df)  >  0, 

4-  ifW^^’'^  denotes  the  exponential  weighting  ofW^  with  as  the  mol¬ 
lification  parameter,  then 


{DW^^’"‘{x),di)  >  -Sn  =  -27exp{-d„/e„} 


for  every  x  G  di. 

Invoking  Theorem  4.2,  the  importance  sampling  schemes  corresponding 
to  are  asymptotically  optimal  if  6n  — >  0,  Enldn  — >  0,  and  ne^  — >  oo. 

4.3.3  Numerical  results 

For  all  the  simulations  in  this  section,  we  set  6  =  — eloge.  The  justification 
for  this  choice  is  based  on  an  argument  analogous  to  that  of  Remark  3.16. 

Consider  the  example  of  a  two-node  tandem  queue  with  individual  buffer 
overflows  as  presented  in  Section  4.3.1.  For  the  case  of  fii  >  ji2,  we  set 
A  =  0.1,  /ii  =  0.5,  ii2  =  0.4,  and  Bi  =  0.9,  B2  =  1.  Simulations  are 
generated  for  n  =  20,  30,  40  with  e  =  0.01.  Below  are  the  numerical  results. 
Each  estimate  consists  of  20,000  replications.  Again,  for  comparison  the 
theoretical  value  is  obtained  using  an  iterative  algorithm. 
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u  =  20 

n  =  30 

n  =  40 

Theoretical  value 

4.81  X  10-^'-^ 

3.97  X  10-^« 

3.47  X  10-^"^ 

Estimate 

4.83  X  10“^^ 

4.04  X  10“^“ 

3.64  X  10“^"^ 

Std.  Err. 

0.20  X  10"^^ 

0.15  X 

0.18  X  10“^"^ 

95%  C.I. 

[4.43,5.23]  X  10“^^ 

[3.74,4.34]  X  10“^“ 

[3.28,4.00]  X  10“^"^ 

Table  3.  IS  based  on  subsolutions,  two-node  tandem,  individual  buffer  overflow,  fii  >  fi2 


For  the  case  of  /ti  <  /t2,  we  set  A  =  0.05,  =  0.35,  /i2  =  0.6,  and  Bi  =  1, 

B2  =  0.6.  We  run  simulations  for  n  =  20,30,40  with  e  =  0.1,  and  each 
estimate  consists  of  20,000  replications. 


n  =  20 

n  =  30 

n  =  40 

Theoretical  value 

1.44  X  10“^^ 

4.82  X  lO”^'^ 

1.61  X  10“^^ 

Estimate 

1.40  X  10"^^ 

5.01  X  lO"^'^ 

1.85  X  lO-^'^ 

std.  Err. 

0.05  X  10-^^ 

0.29  X  10-^« 

0.21  X  lO-^'^ 

95%  C.I. 

[1.30,1.50]  X  10-^^ 

[4.43,5.59]  X  10“^^ 

[1.43,2.27]  X  10“^^ 

Table  4.  IS  based  on  subsolutions,  two-node  tandem,  individual  buffer  overflow,  fii  <  jj,2 


As  for  the  total  population  overflow  for  general  d-node  tandem  networks 
in  Section  4.3.2,  we  run  simulations  for  d  =  4  and  d  =  9.  For  d  =  4,  we 
set  A  =  0.04,  //I  =  •  •  •  =  /t4  =  0.24,  and  run  simulations  for  n  =  20,  25,  30 
with  £  =  0.1.  Again,  each  estimate  consists  of  20,  000  replications,  and  the 
theoretical  value  is  obtained  using  an  iterative  algorithm. 


u  =  20 

n  =  25 

n  =  30 

Theoretical  value 

2.04  X  10“^^ 

5.02  X  10“^'= 

1.10  X  10“^^ 

Estimate 

2.05  X  10-^^ 

5.07  X  10-^'^ 

1.08  X  10-^y 

std.  Err. 

0.04  X  10"^^ 

0.07  X  10"^'’ 

0.03  X  lO”^'^ 

95%  C.I. 

[1.97,2.13]  X  10“^^ 

[4.93,5.21]  X  10“^'" 

[1.02,1.14]  X  10“^“ 

Table  5.  IS  based  on  subsolutions,  five-node  tandem,  total  population  overflow. 


For  d  =  9,  we  set  A  =  0.01,  /ii  =  •  •  •  =  /ig  =  0.11,  and  run  simulations 
for  n  =  20,  25,  30  with  e  =  0.12.  Each  estimate  consists  of  100,000  replica¬ 
tions.  In  this  case,  a  benchmark  value  is  obtained  using  the  same  dynamic 
importance  sampling  algorithm  but  with  10  million  replications  (the  itera¬ 
tive  algorithm  for  computing  the  theoretical  value  in  the  case  of  d  =  4  does 
not  work  here  because  the  state  space  is  too  large). 


n  =  20 

n  =  25 

n  =  30 

Benchmark  value 

3.18  X  10“^"^ 

9.41  X  10“^^ 

2.16  X  10“^^ 

Estimate 

2.93  X  10“^"^ 

10.80  X  lO"^'^ 

1.98  X  10"^^ 

Std.  Err. 

0.23  X  10-^"‘ 

1.30  X  10“^^ 

0.30  X  10“^^ 

95%  C.I. 

[2.47,3.39]  X  10-^^ 

[8.20,13.10]  X  10-^y 

[1.38,2.58]  X  10-^^ 

Table  6.  IS  based  on  subsolutions,  nine-node  tandem,  total  population  overflow. 
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5  Remarks  on  general  queueing  networks 


It  is  possible  to  formulate  a  result,  analogous  to  Theorem  4.2,  that  applies  to 
general  open  Jackson  networks.  However,  due  to  space  limitations,  we  only 
present  a  simple  example  to  illustrate  the  main  idea  and  refer  the  interested 
readers  to  [16]  for  a  more  general  theorem.  The  major  difference  between 
the  theory  developed  in  Sections  3  and  4,  which  was  adequate  for  tandem 
networks,  and  that  of  the  present  section,  is  that  Neumann-type  boundary 
conditions  are  not  sufficient  anymore,  and  one  has  to  consider  the  more 
elaborate  boundary  Hamiltonians. 

Consider  the  following  two-node  Jackson  network  with  feedback.  Again 
assume  Poisson  arrivals  with  rate  A  and  consecutive  exponentially  services 
with  rate  at  node  i.  However,  after  being  served  at  node  2,  a  job  has 
probability  f3  to  be  returned  to  node  1. 


A 

r 


Figure  7:  Two-node  network  with  feedback 


Suppose  that  the  quantity  of  interest  is  the  probability  of  total  popula¬ 
tion  overflow. 


=  P  {network  total  population  reaches  n  before  returning  to  0, 
starting  from  0}  . 


Let  /r  =  /ti  A/i2-  Assuming  the  stability  condition  A  <  /i(l  — /3),  and  without 
loss  of  generality,  A  -|-  /ri  -|-  //2  =  1  j  we  have  [11] 


7  =  lim - \ogpn 

n  n 


log 


A 


The  goal  is  to  find  an  efficient  importance  sampling  scheme  for  the  estimation 
of  pn- 


5.1  System  dynamics 

Let  Z  =  {Z(k)}  be  the  embedded  discrete  time  Markov  chain  that  repre¬ 
sents  the  queue  lengths  at  the  transition  epochs  of  the  network.  Then  the 
dynamics  of  Z  can  be  modeled  by 

Z{k  +  1)  =  Z{k)  +  TT[Z{k),Y{k  +  1)] 
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where  {Y{k)}  are  iid  random  variables  taking  values  in 

V  =  {^;o  =  (1,0),  =  (-1,  l),V2  =  (0,  -1),U3  =  (1,  -1)}, 

and  the  mapping  vr  is  defined  as 

{0,  if  zi  =  0  and  y  =  vi 

0,  if  Z2  =  0  and  y  =  V2  or  vz  ■ 
y,  otherwise 

Under  the  original  probability  measure  P,  the  distribution  of  Y (k)  is  just 
0  =  (A,/ri,(l-/3)/i2,/3/i2)GlP+(V). 

See  Figure  8  for  an  illustration  of  the  boundary  dynamics. 

^2 


Figure  8:  State  dynamics 


5.2  The  Isaacs  equation  and  boundary  Hamiltonian 


We  use  the  same  notation  as  that  introduced  in  Section  3.4.  Following 
the  recipe  of  Section  3,  define  the  scaled  state  process  X'^{k)  =  Z{k)/n. 
Dynamic  importance  sampling  schemes  are  characterized  by  stochastic  ker¬ 
nels  0”[-|-]  on  V  such  that  the  conditional  distribution  of  Y{k  +  1),  given 
{Y{l),...,Y{k)},  is  just  &^[-\X^{k)]  E  1P+(U). 

Following  the  argument  in  Section  3.5,  one  can  write  down  the  Isaacs 
equation  M(DW{x))  =  0  for  x  €  D,  where 


H(p) 


sup  inf 
0eCP+(v)  ^'ey+iv) 


{p,¥{e))  +  '^9[vi]  log 


i=0 


0bi] 

0[uj] 


+  i?(0||0) 
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with 


3 

1F(6')  =  '^0[vi]  ■  Vi, 
i=0 

and  the  Dirichlet  boundary  condition  W{x)  =  0  for  x  £  d^- 

However,  as  far  as  the  boundaries  di  and  82  are  concerned,  the  Neumann- 
type  boundary  condition  {DW{x),di)  =  0  is  not  sufficient  (more  precisely, 
it  is  not  sufficient  for  82,  since  the  direction  of  constraint  is  not  well  defined 
on  82)-  Instead  one  has  to  resort  to  a  boundary  Hamiltonian,  which,  loosely 
speaking,  is  the  Hamiltonian  that  one  obtains  using  the  state  dynamics  on 
the  boundary  [6].  Consequently,  the  boundary  conditions  become 

Mg^{DW{x))  =  0,  ioT  X  €  8i,  i  =  1,2, 

where  the  boundary  Hamiltonian  is  defined  exactly  as  H  except  F(0)  is 
replaced  by  Fj(0)  with 

^  -  Vi. 

27^2,3 

Remark  5.1.  Proposition  4.1  can  be  easily  applied  to  the  interior  Hamilto¬ 
nian  H  and  the  boundary  Hamiltonian  to  show  the  existence  of  saddle 
points  and  the  concavity  of  these  Hamiltonians.  The  formulae  for  the  sad¬ 
dle  points  are  as  follows.  Let  (0*(-),  0*(-))  be  the  saddle  point  for  H,  and 
(0g,(-),  0g.{-))  be  the  saddle  point  for  Then 

Q*{p)  =  e*{p)  =  N{p)  ■  ,  (1  -  /3)/i2e^, , 

—  /  Pi  P2  P2~P1  \ 

0^1  (P)  =  %Ap)  =  ^i(p)  •  (Ae““,/ii,  {1-  (3) iX2e~ ,  (3 iX2e~^ J  , 

—  /  Pi  Pi  ~P2  \ 

^dsip)  =  ^kip)  =  ^2(p)  •  yXe~~ ,  ,  {1  -  (3)fj.2,  [3^2)  , 

where  N(p),  Ni{p)  are  normalizing  constants  so  that  all  these  vectors  are 
probability  vectors  (i.e.,  elements  in  !?■*■(¥)).  Moreover,  BI(p)  =  21ogN(p) 
and  Bla.(p)  =  21ogiVi(p). 

5.3  Piecewise  afRne  subsolutions  and  mollification 

The  definition  of  a  classical  subsolution  is  the  same  as  Definition  3.7,  ex¬ 
cept  that  Neumann  boundary  inequality  {DW{x),di)  >  0  is  replaced  by 
]HIai(L>IT(x))  >  0  for  x  e  8i,  i  =  1,  2. 
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The  construction  of  a  piecewise  affine  subsolution  is  very  similar  to  that 
in  Section  3.8.1.  Define 

ri  =  27(-l,-l),  r2  =  27(-l,0)  +  2(7-a)(0,-l),  r3  =  (0,0), 
where  a  is  given  by 

•  /  log[/ri/ (/ri  +  A  -  (1  -  13) H2)] ,  if  //i  >  H2 
1  log[/ri/(A  +  /3/ri)]  ,  if //i  < //2 


Ml  >  M2 


Ml  <  M2 


Figure  9:  The  Hamiltonians  and  the  choice  of  {r^} 


It  is  not  difficult  to  check  that  0  <  o  <  7.  Now  let  A  Wl  A  VFg 

where 


=  {ri,x)  +  2'^-5 
^2(^)  =  (r-2,  x)  +  27  -  2(5 
iy3^(x)  =  (r3,x)  +  27- (l  +  27/a)(5. 


The  exponential  weighting  of  with  parameter  e  yields  a  smooth  func¬ 
tion 

3  r  1 

W^'^{x)  =  — elog^exp  <  — Wl{x) 

k=i  ^  ^ 

that  satishes 


DW^^\x)  =  Y.pf{x)rk, 


k=l 


eyip{-W^{x)/e] 

ELiexp{-I^fe^(x)/e}' 


We  have  the  following  result,  whose  proof  is  deferred  to  Appendix  D. 
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1 


XI 


d2 

Figure  10:  The  piecewise  affine  subsolution 

Lemma  5.2.  For  each  k  we  have  ]HI(rfc)  >  0,  and  the  function  satisfies 

1.  >  0  for  X  €  D, 

2.  W^’^(x)  <  0  for  X  e  de, 

3.  for  each  i  =  1,2,  and  x  €  di, 

3 

mafiDW^^\x))  >  Y,Pk\^ndArk)  >  -Cexp{-5/e} 
k=l 

for  some  constant  C  that  only  depends  on  the  system  parameter  0. 

5.4  The  importance  sampling  scheme  and  its  asymptotics 

The  importance  sampling  scheme  based  on  is  as  follows.  Define  the 
stochastic  kernel  on  V  by 

3 

=  '^Pk\^)®*i'^k),  ilxeD 

k=l 

and 

3 

=  '^pf{x)&g^{rk),  if  X  G  di. 

k=l 

Here  the  formulae  for  0*  and  Qq,  can  be  found  in  Remark  5.1. 
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We  will  allow  e  and  6  to  be  n-dependent,  denoted  by  en,Sn,  and  let 
0”[-|-]  =  Denote  by  pn  the  corresponding  importance  sampling 

estimator.  We  have  the  following  result,  whose  proof  is  very  similar  to 
that  of  Theorem  3.17.  Indeed,  in  the  proof  of  Theorem  3.17,  the  Neumann 
boundary  condition  is  used  to  derive  (implicitly)  certain  inequalities  associ¬ 
ated  with  boundary  Hamiltonians.  Such  inequalities  can  now  be  obtained 
using  Lemma  5.2.  We  omit  the  details. 

Theorem  5.3.  The  importance  sampling  estimator  pn  is  asymptotically  op¬ 
timal  if  6n  0,  Enj^n  — >  0,  and  UEn  OO. 

0ne  can  also  use  a  fixed  pair  of  parameters  e  and  <5  for  all  n,  which  leads 
to  a  result  similar  to  Theorem  3.14  and  suggests  a  good  choice  may  be  to 
take  5n  =  -enloge„. 


5.5  Numerical  results 

For  all  the  simulations  in  this  section,  we  set  e  =  0.02  and  5  =  — eloge.  For 
the  case  of  pi  >  p2,  we  choose  A  =  0.1, //i  =  0.5, /i2  =  0.4,  and  /3  =  0.1. 
We  run  simulations  for  n  =  20,  30,  40  and  each  estimate  consists  of  20,000 
replications.  The  theoretical  value  is  obtained  using  a  numerical  iterative 
algorithm. 


n  =  20 

n  =  30 

n  =  40 

Theoretical  value 

9.60  X  10“^^ 

2.66  X  10-^'" 

7.27  X  10“^^ 

Estimate 

9.31  X  10"^^ 

2.60  X  10-^'" 

7.33  X  10“^^ 

Std.  Err. 

0.17  X  10-^^ 

0.07  X  10-^'" 

0.33  X  10-^^ 

95%  C.I. 

[8.97,9.65]  X  10-^^ 

[2.46,2.74]  X  10“^'" 

[6.67,7.99]  X  10-^^ 

Table  7.  IS  based  on  subsolutions,  two-node  tandem  with  feedback,  fii  >  fi2 


For  the  case  of  pi  <  p2,  we  choose  A  =  0.1,  /ii  =  0.43,  p2  =  0.47,  and 
P  =  0.2.  We  run  simulations  for  n  =  20,  30,  40  and  each  estimate  consists  of 
20,000  replications. 


n  =  20 

n  =  30 

n  =  40 

Theoretical  value 

4.39  X  10-^*^ 

2.13  X  10-^^ 

9.60  X  10-^^ 

Estimate 

4.62  X  10-^'J 

1.91  X  lO-^'^ 

9.88  X  10“^^ 

Std.  Err. 

0.46  X  10“^*^ 

0.13  X  10“^^ 

0.87  X  10“^^ 

95%  C.I. 

[3.70,5.54]  X  10-^'J 

[1.65,2.17]  X  lO-^'^ 

[8.14,11.64]  X  10“^^ 

Table  8.  IS  based  on  subsolutions,  two-node  tandem  with  feedback,  fii  <  fj,2 
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A  Appendix.  A  large  deviation  result 

In  this  appendix  we  prove  a  useful  large  deviation  result  that  may  be  of  some 
independent  interest.  For  completeness,  recall  the  definition  of  process  Z  by 
(3.2): 

Z{k  +  1)  =  Z{k)  +  7r[Z(A:),  Y{k  +  1)] 

where  {Y(k)}  is  a  sequence  of  iid  random  variables  taking  values  in  V  = 
{uo,  vi,  V2}  with  distribution  0  =  (A,  ni,  112)-  Define  the  hitting  times 

an  =  inf{A:  >  0  :  Zi{k)  +  Z2{k)  =  n}, 

(To  =  inf{fe  >  0  :  Zi{k)  +  Z2{k)  =  0}. 


These  quantities  differ  slightly  from  Tn  and  Tq,  which  are  defined  only  for 
the  initial  condition  0  and  which  use  >  0  in  the  definition  of  Tq.  To  ease 
notation,  let 

=  {(-^1,  Z2)  e'LX  ■.  Zi  +  Z2  <n}  . 

We  have  the  following  result. 


Proposition  A.l.  There  exists  a  eonstant  c  >  0,  whieh  only  depends  on 
the  system  parameter  (A,/ii,//2),  sueh  that 


lim  sup  sup  —  log  Ez 
n  z&Ln 


gC((T„Ao-o) 


<  00. 


Here  denotes  expeetation  eonditioned  on  Z{0)  =  z. 

The  main  difficulty  in  proving  such  a  result  is  that  the  definition  of 
(To  requires  that  the  state  process  hit  a  single  point,  and  that  it  is  not 
sufficient  to  consider  instead  a  small  neighborhood  of  this  point.  The  key 
idea  to  overcome  this  is  to  study  a  closely  related  one-dimensional  process. 
Let  S{z)  =  Ezlao]  for  every  2;  G  Z^.  S  is  finite,  thanks  to  the  stability 
assumption.  Define  the  process 


5(z(t)), 

(  (To  —  fe  ,  if  /c  >  (To 


In  other  words,  the  process  Q  is  random  until  the  process  Z  hits  the  origin, 
after  which  Q  becomes  deterministic  and  decreases  by  1  each  step.  The 
scaled  continuous-time  piecewise  affine  interpolation  process  is  just 

1  71  f  —  I  T?/"  I 

Qn{t)  =  -Q{[nt\)  + - ^  [Q{[nt\  +  1)  -  Q{[nt\)] , 
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for  t  >0. 

In  order  to  give  a  large  deviation  upper  bound  for  the  processes  {Qn}, 
we  need  the  following  dehnitions.  Fix  any  a  G  M.  For  each  z  G  Z^,  dehne 

h{z;  a)  =  logF;2exp{a((5(l)  -  (5(0))}  (A.l) 

and 

H{a)  =  sup  h{z]a).  (A. 2) 

Clearly,  H  is  convex  since  h{z]  ■)  is  convex  for  each  z.  The  convex  conjugate 
of  H  is  denoted  by  L,  or, 


L{f5)  =  sup  [a/3  —  H{a)\ .  (A. 3) 

q;GR 

The  function  L  is  non-negative  since  iil(O)  =  0,  and  it  will  serve  as  a  local 
rate  function.  For  any  hxed  time  T  >  0,  let  C([0,  T];  M)  be  the  Polish  space 
of  continuous  functions  on  interval  [0,  T]  equipped  with  the  supremum  metric 
p.  Dehne  a  mapping  It  '■  <^([0,  T];  M)  ^  M+  U  {ooj  by 

j  ^  f  /o^  A((^(t))  dt,  if  (j)  is  absolutely  continuous 

\  oo  ,  otherwise  ’ 

and  its  level  set 

$.(s)  =  {(^  G  C([0,  T];  M)  :  =  x,  It{^)  <  s} 

for  every  x  G  M  and  s  >  0. 

We  have  the  following  results,  whose  proofs  are  deferred  to  the  end  of 
this  appendix.  Proposition  A.l  is  a  consequence  of  these  lemmas. 

Lemma  A. 2.  There  exists  a  eonstant  M  >  0  such  that  S{z)  <  M{zi  +  Z2) 
for  every  z  G  Z^,  and  the  absolute  value  of  all  increments  of  {(5(A:)}  are 
uniformly  bounded  by  M . 

Lemma  A. 3.  Let  T  >  0  be  given. 

1.  lTi4>)  >  0  for  every  (j),  and  Iri'f)  =  0  z/  and  only  if  (j){t)  =  —1  for  a.e. 
te[o,T]. 

2.  There  exists  a  constant  K  such  that  It{4’)  “Is  finite  only  iffi  is  Lipschitz 
continuous  with  Lipschitz  constant  K . 

3.  Given  any  compact  set  F  C  M,  the  union  of  level  sets, 

compact  for  any  s  >  0.  In  particular,  It  is  lower  semicontinuous. 
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4-  For  any  h  >  0  and  s  >0,  we  have 

limsup  sup  -logP^  {p{Qn,  ^Siz)/n{s))  >  h}  <  -s. 
n  z&Ln 


Proof  of  Proposition  A.l.  Let  M  be  the  constant  in  Lemma  A. 2,  and  K 
be  the  Lipschitz  constant  in  Lemma  A. 3.  For  any  <5  >  0  and  T  >  0,  define 

=  {(^EC'([0,r];M):  <^(0)  G  [0,M],-5<<^<M  +  <5, 

(j)  is  absolutely  continuous,  |0|  <  A  V  m|  , 

which  is  a  compact  subset  of  (^([O,  T];  M).  Since  It  is  lower  semicontinuous 
by  Lemma  A. 3,  it  attains  its  minimum  on  F^.  However,  it  is  not  difficult  to 
see  that  >  0  for  any  (/>  E  if  T  >  M  +  (5.  Indeed,  suppose  =  0. 

Then  by  Lemma  A. 3  we  have  =  (/>(0)  —  t.  If  (/>(0)  E  [0,  M]  then  for  any 
M  +  5  <t  <T,  (j){t)  =  0(0)  —  t<  —6.  Thus  0  0  F^.  It  follows  that,  as  long 
asT  >  M  +  6,  min{/7’(0)  :  0  E  F^}  >  0,  thanks  to  the  lower-semicontinuity 
of  It  and  the  compactness  of  F^. 

Now  fix  an  arbitrary  6  (the  specific  value  of  S  is  not  important),  and  let 
to  =  M  -h  4(5.  Define 

s  =  ^  min{/f,(0)  :  0  E  F^f}  >  0. 

For  any  x  and  0  E  ^x(s),  by  Lemma  A. 3  again,  0  is  Lipschitz  continuous 
with  \(j)\  <  K.  However,  <^3,(5)  H  F^^  =  0  by  definition.  Therefore,  for  any 
X  E  [0,  M]  and  0  E  ^^3,(5),  we  must  have 

inf{t  >  0  :  0(t)  0  [—2(5,  M  +  25]}  <  to-  (A.4) 

Define  the  following  stopping  time 

=  inf{t  >  0  :  Qn{t)  0  [-5,  M  +  5]}. 

Thanks  to  Lemma  A. 2,  each  increment  of  {Q{k)}  is  uniformly  bounded  in 
absolute  value  by  M,  which  in  turn  implies  that  Qn  has  Lipschitz  continuous 
sample  paths  and  \Qn\  <  M-  Moreover,  for  any  initial  condition  Z(0)  =  z  E 
Z„,  Lemma  A. 2  implies  Qn(0)  =  S{z)/n  E  [0,  M].  It  follows  that 

p.  (r^  >  to)  =  P.  {Qu  e  Fl)  . 
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Thanks  to  equation  (A. 4),  for  every  Qn  G  we  have 

p{Qn,^S(z)/n{s))  >  5, 

Therefore, 

>  to)  <  (p(Qn,  ^S(^)/n('S))  >  S) 

However,  it  follows  from  Lemma  A. 2  that  {cr„  A  (Tq  >  C  {r^  >  to}  for 
n>  M/6.  As  a  consequence, 

limsup  sup  —  \og¥z{<Tn  A  do  >  nto) 

n  Z&7.U 

<  limsup  sup  —  logP2(r^  >  to) 

n  Z&Zn 

<  limsup  sup  -  logP^  {p{Qn,  ‘^Siz)/n{s))  >  5) 

n  z£Zn 

<  -S, 

here  the  last  inequality  is  by  Lemma  A. 3.  In  particular, 

sup  P2(crn  A  (To  >  [ratoj  +  1)  <  sup  P2(crn  A  (To  >  nto)  < 


for  n  big  enough.  Let  kn  =  [utoj  +  1.  Thanks  to  the  Markov  property,  for 
all  sufficiently  large  n  and  all  j  >  0 

sup  F^{an  A  do  >  jkn)  < 


Let  c  be  any  constant  such  that  0  <  c  <  s/(4to).  We  have,  for  n  big  enough, 
ckn  <  ns/^,  which  implies  that 

OO  +  — 1 


E, 


gC((T„Acro) 


e“P2((Tn  A  do  =  i) 


j=0  i=jk„ 


< 


< 


< 


gcfc„  ^  <  o-„  A  do  <  (j  +  1)A;„ 

i=o 


gCfc„  ^-j(nsl2-ckn) 


E' 

i=o 


gCfcn  ^  g-J>ls/4 

j=0 


„ckn 


1 


2  _  (i—ns/A  ' 


1) 


38 


Therefore, 


lim  sup  sup  —  log  Ez 


3c((T„Acro) 


ckn  ,  1  ,  1 

<  hm - h  hm  -  log  - - ^ 

n  n  n  n  1  — 


=  CtQ. 


This  completes  the  proof. 


It  remains  to  show  Lemmas  A. 2  and  A. 3.  The  proof  is  technical  and  we 
need  to  investigate  the  processes  in  great  detail.  We  begin  with  the  following 
result,  whose  proof  is  a  straightforward  consequence  of  the  definition  of  Q{k) 
and  thus  omitted. 

Lemma  A. 4.  Let  Tfc  =  cr(Z(0),  T(l), . . .,  y(fc)).  Then 

Ez[Q{k  +  l)-Q{k)\Tk]  =  -l 

for  every  z  G  and  every  A:  >  0. 

The  next  lemma  is  concerned  with  the  monotonicity  of  the  sample  path 
with  respect  to  the  initial  conditions.  To  be  more  precise,  for  z,  z  G  Z^,  we 
say  z  <  z  \i  the  inequality  holds  component- wise.  Also  for  G  Z^,  denote 
by  the  sample  path  corresponding  to  initial  condition  that  is, 

Q^(0)  =  2,  Q\k  +  1)  =  Q\k)  +  7r[g"(fe),  Y{k  +  1)]. 

Lemma  A. 5.  Define  g  :  Z^  — >  Z_|_  by  g{z)  =  zi  +  Z2-  Given  any  z,  z  £  Z^ 
such  that  z  <  z, 

Q\k)  <  Q\k) 

g{Q\k))-g{Q~\k))  <  g{z)-g{z) 


for  every  k  >0. 

Proof.  We  use  induction.  The  claim  is  trivial  for  k  =  0.  Assume  for  now 
that  it  holds  for  some  A:  >  0.  Introduce  the  following  notation: 

r  =  {z  G  Z+  :  >  0,  Z2  >  0}, 

Ti  =  {z  G  :  zi  =  0,  Z2  >  0}, 

r2  =  {z  G  Z+  :  >  0,  Z2  =  0}. 

We  consider  the  following  possible  scenarios  separately:  (i)  Q^{k)  G  T;  (ii) 
Q^{k)  G  Ti;  (hi)  Q^{k)  G  r2;  (iv)  Q^{k)  =  0.  Since  the  proofs  for  these 
cases  are  essentially  the  same,  we  choose  to  only  present  case  (ii).  Assume 
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that  Q^{k)  G  Fi-  Thanks  to  the  induction  hypothesis  Q^{k)  <  Q^{k),  we 
must  have  Q^{k)  G  Ti  or  Q^{k)  G  F.  If  Q^{k)  G  Fi,  or  Q^{k)  G  F  but 
F(A:  +  1)  7^  ui,  then  7r[Q^{k),  Y{k  +  1)]  =  Tr[Q^{k),  Y{k  +  1)]  and  the  claim 
holds  for  k  +  1.  It  only  remains  to  show  for  the  case  where  Q^{k)  G  F  and 
F(A;+  1)  =  vi-  In  this  case  Q^{k  +  1)  =  Q^{k)  and  Q^{k  +  1)  =  Q^{k)  + 
vi  =  Q^{k)  +  (—1, 1).  But  since  Q\{k)  >  0  and  Q\{k)  =  0,  it  follows  that 
Q^{k  +  1)  <  Q^{k  +  1).  Furthermore,  note  that  g  {Q^{k  +  1))  =  s'  {Q^{k)), 
g  {Q^{k  +  1))  =  g'  {Q^{k)).  This  completes  the  proof.  ■ 

Proof  of  Lemma  A. 2.  Let  M  =  25((1,  0))  +  25((0, 1)).  We  would  like  to 
show  that  for  any  z  G  and  any  i  =  0, 1,  2, 

\S{z  +  T:[z,Vi])  -  S{z)\<M.  (A. 5) 

We  can  assume  that  7r[z,  Vi]  =  u,,  since  otherwise  there  is  nothing  to  prove. 
First  we  consider  the  case  i  =  2,  and  let  z  =  z  +  V2  =  {zi^  Z2  —  Y)  <  2;.  Define 
stopping  times 


=  inf{A:  >  0  :  Q^{k)  =  0},  =  inf{A:  >  0  :  Q\k)  =  0}. 

Thanks  to  Lemma  A. 5,  we  have  Q^{k)  <  Q^{k)  for  any  A:  >  0,  which  implies 
rpz  ^  rpz ^  gy  game  lemma,  for  every  A;  >  0,  g{Q^{k))  —  g{Q^{k))  < 
g{z)  —  g{z)  =  1.  In  particular,  for  k  =  T^,  we  have  g{Q^{T^))  <  1.  It 
follows  that 

Q^(nG{(0,0),(l,0),(0,l)}. 

Now  the  strong  Markov  property  yields 

S{z)  =  S{z)  +  P{Q^(n  =  (1,  0)}5((1, 0))  +  nQ\T^)  =  (0, 1)}5((0, 1)). 

Thus  15(2:)  —  S'(z)|  <  ^((l,  0))  +  ^((0, 1))  <  M/2.  The  proof  for  the  case 
i  =  0  is  almost  verbatim.  For  i  =  l,  z  +  Vi  =  z  +  (—1, 1).  One  can 
use  the  same  argument  to  prove  that  |5(2;)  —  S{z  +  (— 1,0))|  <  M/2  and 
|5(2;  +  (— 1, 0))  —  S{z  +  {—l,  1))|  <  M/2,  and  then  use  the  triangle  inequality 
to  show  15(2  +  ui)  —  5(2)1  <  M.  We  omit  the  details. 

It  follows  from  (A. 5)  that  the  increment  of  {Q{k)}  is  uniformly  bounded 
by  M  (note  that  M  >  1  trivially  since  S{z)  >  1  for  every  2  7^  0).  Now  for 
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every  z  e  Z^,  we  can  write  S{z)  as 


S{z)  =  [5(z)  -  5((0,  +  Z2))]  +  [5((0,  +  Z2))  -  ^((O,  0))] 

21-1 

=  '^[S {z  +  ivi)  -  S{z  +  {i  +  l)vi)] 

i=0 

Zl+22-1 

+  ^  [S{z  +  Zivi  -  iv2))  -  S{z  +  Zivi  -  {i  +  1)V2))]. 

i=0 

Thanks  to  (A. 5)  again,  the  absolute  value  of  each  summand  is  bounded  by 
M.  Thus  S{z)  <  M{2zi  +  2:2).  Taking  M  =  2M  completes  the  proof.  ■ 


Proof  of  Lemma  A. 3.  Recall  the  definition  of  h{z',  a)  and  H{a)  by  (A.l)- 
(A.2).  We  first  show  that  H  is  convex  and  Lipschitz  continuous  with  H{0)  = 
0.  To  this  end,  note  that  h{z-,  •)  is  convex  and  satisfies  h{z]  0)  =  0  for  each 
fixed  z  G  Z^.  Therefore  H  is  convex  with  H(fi)  =  0.  Let  M  be  the  uniform 
bound  on  the  increments  of  {Q{k)}  given  by  Lemma  A. 2.  It  follows  easily 

\h{z;  a)|  <  M|a|. 


Therefore  \H(a)\  <  M\a\  for  every  a,  whence  H  is  Lipschitz  continuous 
(thanks  to  its  convexity). 

We  claim  that  H  is  differentiable  at  a  =  0  and  H'{G)  =  —1.  Indeed, 
since  h{z-,  a)  is  differentiable  with  respect  to  a  and  h{z\  0)  =  0,  we  have 


Jiia)  h(z:a) 

- =  sup  - 

a  a 


sup  Dah{z]  a[z]), 
2ez2 


where  a[z\  is  some  number  between  0  and  a.  But  thanks  to  Lemma  A. 4, 
Dah{z]0)  =  Ez[Q{l)  —  <5(0)]  =  —1.  Moreover,  Lemma  A. 2  and  simple 
algebra  yield  that  \Daah{z',  a)\  <  K  for  some  constant  K  and  for  every 
z  G  Z^  and  a  G  M.  It  follows  that 


H{a) 

a 


+  1 


<  K\a 


which  converges  to  0  as  a  — >  0,  or  is  differentiable  at  a  =  0  with  H'{0)  = 

-1. 

The  convexity  of  H  and  H(0)  =  0  imply  that  L,  defined  by  (A. 3),  is 
convex  and  non-negative.  The  Lipschitz  continuity  of  H  implies  that  L 
takes  value  infinity  outside  a  compact  set.  Lastly,  the  differentiability  of  H 
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at  a  =  0  with  H'{0)  =  —1  imply  that  L{P)  =  0  if  and  only  if  /3  =  —1.  Parts 
1  and  2  of  Lemma  A. 3  follow  from  these  properties  of  L.  The  rest  of  the 
lemma  follows  from  Theorem  4.1  of  [5]  and  that 

L{P)  <  l{z]  (5)  =  sup[q!/3  —  h{z]  a)]. 

a 

This  completes  the  proof.  I 


B  Appendix.  Proof  of  main  theorems 

We  put  the  proofs  of  the  main  results,  Theorem  3.14  and  Theorem  3.17, 
together  in  this  appendix.  These  proofs  are,  in  essence,  verification  type 
arguments.  We  start  with  a  few  useful  technical  results. 

Lemma  B.l.  The  function  as  defined  in  (3.13)  satisfies  the  following. 

1.  >  0  for  all  x  £  D, 

2.  <  0  for  all  x  E  d^, 

3.  {DW^’^{x),di)  >  —27  exp  {—(5/e}  for  every  x  E  9*. 

f.  There  exists  a  constant  C  which  only  depends  on  the  system  parameter 
(A,^i,//2),  such  that 


dxjdxj 


<3 

e 


for  every  x  £  D  and  every  i,j. 


Proof.  Thanks  to  (3.14),  the  concavity  of  H  (Proposition  3.4),  and  that 
H(^fc)  >  0,  it  follows  that 


m{DW^’^{x))  =  H  (x)rfc  >  Y,Pk\^Mrk)  >  0. 

Vfc=l  /  k=l 

By  Lemma  3.12  we  have  W^'^{x)  <  W^{x).  But  W^{x)  <  0  for  x  £  d^hy 
definition,  and  so  the  second  claim  follows. 

Since  (ri,  di)  =  {r^,di)  =0  and  (r2,  di)  =  —27,  we  have 

{DW^^\x),df)  =  -2ypf{x). 

For  X  E  Si,  thanks  to  (3.15)  and  (3.12),  we  have 

P2\x)  <  ^ 

exp{-W|(x)/e| 
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27  exp{— (5/e}. 


Similarly,  for  x  ^  82,  we  have 
{DW^’\x),d2)  =  -2'ypl’\x)  >  -2j 


exp  [—W({x)/e'^ 
exp{-l^|(a:)/e} 


This  en(is  the  proof  of  the  thir(i  claim. 

As  for  the  last  claim,  it  follows  easily  from  (3.14)  that 


d'^W^’^ix)  _  ^  dpf{x) 


dxidxj 


E 

k=l 


dx 


{rk,ei), 


3 


where  e,  is  the  standard  i-th  unit  vector.  However, 

^  3 


dp'fix)  1 


dxj 


£,<5/  N 

=  --Pk  (®) 


£,<5/  N 
=  Z'pk  (^) 


dW^,{x) 

dxj 


+  E 


m=l 

3 


dWjix) 

dXn 


-{rk,ej)  +  E  Pmix){rm,ej 


m=l 


The  last  claim  follows  readily  from  the  definition  of  {vk}  and  that  is 

bounded  between  0  and  1.  ■ 


We  now  dehne  a  few  functions  that  are  closely  related  to  the  interior 
Hamiltonian  H  and  the  boundary  Hamiltonians.  For  each  o;  >  0  and  0,  0  G 
T+(V),  let 

2 


Z(a,p;  0,  0)  =  (1  +  a){p,¥{0))  +  (1  +  2a)  E^N  log  11^  +  ^(^ll©)- 

i=o 

Similarly,  for  each  j  =  1,2,  let 


and  define 
L 


,(a,  p;  0,  0)  =  (1  +  a) (p,  F/0))  +  (1  +  2a)  E  log  +  ^(^ll©)- 

i=o 

Lemma  B.2.  Letp  G  such  that  B[(p)  >  0.  Then  for  any  given  a>t)  we 
have 

inf  L(a, p;  0*(p),  0)  >  0, 
e&v+(Y) 

where  0*(p)  is  as  defined  in  Proposition  3.4- 
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Proof.  By  definition  of  L,  (3.7),  and  Proposition  3.4,  it  is  not  difficult  to 
check  that 


L{a,p]e*{p),9)  =  L{0,  p]  e*{p),  9)  +  2a  log  N{p) 
=  L(O,p;0*(p),0)  +  aH(p). 

However,  thanks  to  Proposition  3.4  again,  we  have 

^^mf^^L(O,p;0*(p),0)  =  L(0,p;  O* (p) ,  9* (p))  =  m{p) . 


This  completes  the  proof. 


Proof  of  Theorem  3.14.  To  ease  exposition,  we  will  use  the  notation 
W  =  pk  =  and  set  e  =  27exp{— 5/e}.  Fix  any  a  >  0.  We  claim 

that,  for  every  x  G  D^, 


inf  L(a,DW(x)]Q^[-\x],9)>0.  (B.l) 

9e9+{Y) 


Indeed,  thanks  to  the  definition  of  L,  the  concavity  of  the  logarithmic  func¬ 
tion,  and  that  DW{x)  =  '^pk{x)rk,  0”[-|x]  =  Pk{x)Q*{rk),  one  can 
check  that 

2 

L{a,  DW{x)-,@'^[-\x],9)  >  y^^pk{x)L{a,rk-,&*{rk),9). 

k=0 

Since  H(rfc)  >  0,  it  follows  from  Lemma  B.2  that 


inf  L(a,rfc;0*(rfc),6')  >  0, 
0ey+(V) 


which  in  turn  implies  (B.l).  We  claim  that  for  every  x  ^  dj  r\  Dn, 


inf  Lj(a,  DW(x)]  0”[-|a:],  9)  >  —(1  -|-  a)e. 
0ey+(V) 


(B.2) 


Indeed,  thanks  to  (B.l), 

Lj{a,DW{x);e'^[-\x],9) 

=  L{a,  DW{x)]  0”[-|x],  0)  -  (1  +  a)9[vj]  •  {DW{x),  vj) 

>  -{1  +  a)9[vj]  ■  {DW{x),Vj). 

Recalling  that  dj  =  —vj,  (B.2)  now  follows  readily  from  Lemma  B.l. 
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We  now  show  that  inequalities  (B.l)  and  (B.2)  imply 


inf  <  +  a) 

0ey+(V)  I 


n 


.  i=0 


W  [  X  -I — Tr{x,  Vi)  )  —  W{x) 


e[vi 


+(l  +  2a)  J^0[7;i]log^^Eg^  +  i?(0||0)  }  >  -(1  +  a) 


i=0 


Q[vi, 


(B.3) 

'c  : 

- h  £ 

ne 


for  every  x  G  Dn,  where  C  is  a  constant  that  only  depends  on  the  system 
parameter  (A,  //i,  /i2)-  To  this  end,  we  consider  separately  the  cases  x  G  Dn 
(interior)  and  x  G  djCiDn  (boundary).  For  x  G  D^,  7r{x,  Vi)  =  Vi.  Therefore, 
by  a  Taylor  series  expansion, 

■9[vi]  =  {DW{x),Vi)-9[vi]+-;^{vi,D'^W{xi)vi)-9[vi], 

In 

where  Xi  is  some  point  on  the  line  connecting  x  and  x  +  Vi.  Thanks  to 
Lemma  B.l,  the  definition  of  F  [see  (3.7)],  and  that  ||ui|p  <  2,  we  have 


n 


W  [X+ -Vi]  -  W{x) 


E 

i=0 


W  \  x  +  —Vi 
n 


W{x) 


■9[vi]  >  {DW{x),¥{9)) 


_C 

ne 


This  and  inequality  (B.l)  immediately  lead  to  (B.3).  The  case  of  x  G  djCiDn 
is  similar,  except  now  that  7r(x,  Vi)  =  Vi  i  ^  j  and  7r(x,  Vj)  =  0.  We  omit 
the  details. 

Applying  the  relative  entropy  representation  (Remark  3.3)  to  the  left- 
hand-side  of  (B.3)  and  adopting  the  notation  /3„  =  Cjine)  +£,  we  have,  for 
every  x  e  Dn, 


g— (l+a)/3„ 


2 


'y  [[  ^-{l+ct)n[W(x+TT(x,Vi)/n)-W(x)\ 
i=0 


Qbi] 


l+2a 

•  Obi]  <  1- 


Recalling  the  definition  of  W”  in  (3.4),  this  display  implies  that  the  process 
M  =  {AI{k)  :  k  >  0},  where 


/  k—1 

]\,f(U)  =  p-{i+a)0„k  -{l+a)nW(X"{k))  I  TT  Q[T(j  -|-  1)] 

^  '  111  M  T)l  yn/'i 


l+2a 


is  a  supermartingale  under  the  original  probability  measure  P.  Thanks  to 
the  Optional  Sampling  Theorem  and  the  non-negativity  of  M, 

£;^M(T„  ATo)  <  F^M(O)  = 
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Recalling  that  Pn  =  Pn  ■  l{r„<To}  W{x)  <  0  for  x  e 
M{Tn  A  To)  >  M{Tn)  ■  l{T„<ro} 

—  f^—0-+ot)PnTn^—{l+a)nW{X'^(Tn))pl+2a 


>  p— (l+“)/3ra7ri  "l+2a 

—  Pn 


It  follows  that 


p— (l+“)/3ra7ti  ~l+2a 

Pn 


<  e 


—  {l+a)nW{0) 


By  Holder’s  inequality, 


[2nd  moment  of  Pn]  =  E^[pn] 


<  E^ 


^  —  {^P<y)l3nPn 

Pn 


1+20 


E^ 


l+2a 


'e^MTn/\To) 


l+2a 


which  yields 


lim  inf - log  [2nd  moment  of  Pn] 

n  n 


> 


1  -j-  cr 


IH(0)  - 


2q; 


1 


lim  sup  —  log  E^ 
l  +  2a  n  n 


1  +  2a 

Let  c  be  the  constant  in  Proposition  A.l,  and  let 


g^dnHnATo)' 


(B.4) 


1 

C  =  lim  sup  sup  —  log  E, 


x&D, 


n 


ATq) 


It  follows  immediately  from  Proposition  A.l  that  C  is  finite.  Note  that  (B.4) 
holds  for  any  a  >  0.  In  particular,  it  holds  for  a  =  ejc.  With  this  choice  of 
a,  we  have 

\  Oi  C  E  C 

~  Pn  — 


2a 


2a  ne^  2^  2' 


Therefore,  if  e  <  c,  then  for  n  big  enough, 

1  T  rr 


2a 


~Pn  ^  C, 


and  (B.4)  yields 


1  1  “1“  cx  2cx  — 

lim  inf - log  [2nd  moment  of  >  - VP(0) - C. 

n  n  1  +  2a  1  +  2a 
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The  right-hand-side  of  the  last  display  equals 

^(0)  -  rr^mO)  +  2C]  =  1T(0)  -  £^-[1^(0)  +  2C]. 

1  -|-  2a  c  -|-  2e 

However,  since  VT(0)  <  27,  we  have 

lim inf - log  [2nd  moment  of  Pn]  >  W{0)  —  e- [27  -|-  2(7] . 

It  follows  from  Lemma  3.12  that 

IT(0)  =  iy=’^(0)  >  W\{))  -  3e  =  27  -  35  -  3e.  (B.5) 

Recall  that  e  =  27exp{— 5/e}.  We  conclude  the  proof  by  setting  A  = 
27[27  -|-  2(7] /c,  and  to  enforce  e  <  c  (which  was  assumed  in  the  proof)  we 
set  B  =  l/log(27/c)  if  c  <  27  and  R  =  00  if  c  >  27.  ■ 


Proof  of  Theorem  3.17.  It  suffices  to  show  that 

lim  inf - log  [2nd  moment  of  pA  >  27, 

n  n 

since  the  other  direction  is  automatic  by  Jensen’s  inequality  (see  Section  2). 
We  use  the  notation  and  Sn  =  exp{— 5„/e„}. 

The  same  argument  leading  to  inequality  (B.4)  gives  that,  for  any  strictly 
positive  sequence  {a„}. 


lim  inf - log  [2nd  moment  of  Pn] 

n  n 

>  lim  inf  ,  ^  W^ifd)  —  lim  sup 


1 


1  -|-  2aj^ 


1  -I-  2an  n 


\ogE^ 


,^/3™(T„ATo)' 


where 

(j 

Pn  —  T 
nSn 

In  particular,  we  should  choose  a„  so  that 


1  +  Q^n 
2aj^ 


Pn  —  C, 


or,  an 


Pn 

‘2c-  Pn' 


Note  that  a„  is  strictly  positive  (at  least  for  n  big  enough)  and  a^  — >  0  since 
/3„  — >  0  by  assumption.  It  follows  that 

lim  inf - log  [2nd  moment  of  Pn\  >  liminf  VL’^(O). 

n  n  n 


However,  by  (B.5)  IT"(0)  >  27  —  35„  —  36^.  This  completes  the  proof.  ■ 
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C  Appendix.  Proof  of  Proposition  3.4 


Fix  arbitrarily  p  e  For  p  G  and  0,  0  G  1P''“(V),  define 


L(p;  0,  e)  =  {p,  F(0))  +  ^  e[vi\  log  11^  +  i?(0||0). 


It  follows  from  the  definition  of  relative  entropy  that 


L(p;  0,  6) 


2  2 

(p,  F(fl)>  -  ^  9[pj]  l„g|M  +  ^  9[„i]  log  +  _R(p||e) 

i=0  ^  i=0  * 

(p,F(0))-i?(0||0)  +  2i?(0||0). 


We  first  show  that  (0*(p),  0*ip))  is  a  saddle  point,  or 


L(p;  0,  9*{p))  <  L{p-  0*(p),  9*{p))  <  L{p-  0*(p),  9) 


for  every  0,  0  G  !?■*■(¥).  The  first  inequality  follows  from  the  non-negativity 
of  relative  entropy,  that  R{'^\\9)  =  0  if  and  only  if  7  =  0,  and  that  0*(p)  = 
9*(p).  Now  we  consider  the  second  inequality.  It  is  easy  to  verify  that 


and 


Therefore, 


&*{p)[vi] 

0[Ug 


logiV(p)  -  -{p,Vi), 


2 


{p,¥{9))  =  '^9[vi]{p,vi). 

i=0 


L{p-  @*{p),  9)  =  logiV(p)  +  l^9[vi]{p,  Vi)  +  i?(0||0). 

i=0 

A  straightforward  calculus  computation  shows  that  L(p;  0*(p),  0),  as  a  func¬ 
tion  of  0,  attains  its  minimum  aX  9  =  9*{p).  The  second  inequality  follows 
readily. 

The  existence  of  the  saddle  point  (0*(p),  9*{p))  implies  that 
m{p)=L{p-  @*{p),9*{p))  =  2logN{p), 


and  that  the  order  of  sup  and  inf  can  be  exchanged,  or 


H(p) 


inf  sup  L(p-,<d,9). 
0ey+(V)  0gy+(v) 
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Thus 


_  sup  L{p]Q,9)  =  _  sup  [(p,  F(0))  -  i?(0||0)  +  2i?(0||0)] 

0gy+(v)  0ey+(V) 

=  (p,F(0))  +  2/?(0||0)  -  inf  i?(0||0) 

0gy+(V) 

=  (p,F(0))  +  2i?(0||0). 

Since  H  is  the  infimum  of  affine  functions  (ofp),  it  is  concave.  This  completes 
the  proof.  ■ 


D  Collection  of  miscellaneous  proofs 


Proof  of  Lemma  4.4.  Clearly,  BI(rrf+i)  =  B[(0)  =  0.  For  1  <  A;  <  d, 
Proposition  4.1  implies  that  B[(rfc)  =  21ogA^(rfc)  where 

1  _  ^ 

■777 — r  =  /r  +  pi  +  •  •  •  +  Pd-fc  H - z - h  f^d+2-k  +  •  •  •  +  t^d- 

N{rk)  // 

In  order  to  show  N(rk)  >  1,  it  suffices  to  show  that 


H - = -  <  A  +  Pd+l-k: 

or  equivalently 

{pd+i-k  -  /I)  (/I  -  A)  >  0, 

which  directly  follows  from  the  assumptions.  Therefore  we  have  Bl(rfc)  >  0. 
Furthermore,  for  x  G  F  we  have 


W'^{x)  <  Wi{x)  =  — 27(xi  +  X2  +  ■  ■  ■  +  Xd)  +  27  —  d  <  — d  <  0. 


Now  assume  x  €  di  for  some  1  <  i  <  d.  Suppose  DW'^{x)  is  well 
defined,  or  equivalently,  the  Wi{x)  A  •••  A  W2j^^{x)  =  WJ^^ix)  for  some 
unique  k*  G  {1,  2, . . . ,  d  +  1}.  In  this  case,  DW^{x)  =  Vk* ,  and  we  wish  to 
show  (rfc*,  dj)  >  0.  However,  for  every  k  we  have 


(rfc,  di)  =  I 

Thus  it  suffices  to  show  that  k* 
of  {rfc}  and  x*  =  0  imply 


-27, 

0  , 

7^  d  + 1 


iffe  +  i  =  d+  l 
otherwise 


(D.l) 


—  i.  This  is  true,  since  the  definition 


IFrf+2-i(^)  =  {rd+2-i,  x)  + 'y  -  {d  +  2  -  i)6n 

=  {rd+i-i,x)  +  'y  -  {d  +  2-i)6n 

=  W2^,_,{x)-6 
<  W2+1-^{X). 
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It  remains  to  show  that  {DW^'^’'^{x),  di)  >  — 27 exp{— (5„/e„}  for  x  e  di 
We  have 

d+1 

DW^-’^{x)  =  J2p'r"ix)rk. 

k=l 

Thanks  to  (D.l),  we  only  need  to  show  p^J]!i_^{x)  <  exp{— (5„/e„}  for  x  ^  di 
To  this  end,  observe  that 


er^,n  ^  exp{-W”+^_.(x)/e„} 


C71 .  »fc  /  \ 

Pd+l-ii^)  ^ 


exp{-W"^2-i(®)/^»^} 


=  exp{-5„/e„}. 


This  completes  the  proof. 


Proof  of  Lemma  5.2.  We  will  only  present  the  proof  for  the  case  pi  < 
and  omit  the  analogous  proof  for  pi  >  p2- 

Assume  pi  <  p2  hereafter,  and  use  the  notation  IT  =  IT^’^  and  pk  = 
The  formulae  in  Remark  5.1  yield 


H(ri)  =  21ogA^(ri) 


-2  log 


(1  —  /3)pi  +  Pi  +  Pp2  + 


Ap2 

Pi 


By  assumption  A  <  (1  —  /3)pi  and  pi  <  P2.  Therefore 


((i-/3)/ii-A)  >0^(l-/3)pi  +  ^  <  A  +  (1-/3)p2. 

\Pi  J  Pi 

Since  A  +  pi  +  p2  =  1,  we  have  B[(ri)  >  0.  Similarly,  we  have  B[(r2)  =  0 
and  H(r3)  =  0.  Thanks  to  the  concavity  of  H,  DW{x)  =  and 

^^Pfc(x)  =  1,  pk{x)  >  0,  we  have  Il{DW{x))  >  0.  As  for  x  G  d^-,  we  have 
lT(x)  <  (ri,  x)  +  27  —  (5  =  — (5  <  0. 

It  remains  to  show  the  part  3.  Thanks  to  the  concavity  of  we  have, 
for  X  £  di, 


3 


2 


IIq^{DW{x))  >  pk{x)IlQ.{rk)  =  '^pk{x)Ua^{rk). 

k=l  k=l 


However,  it  is  not  difficult  to  show  that 


Hai(n)>0,  B[a2(r’2)  =  0. 


Therefore,  we  only  need  to  show  p2{x)  <  exp{— d/e}  for  x  £  di  and  pi{x)  < 
exp{— (5/e}  for  x  £  d2-  For  x  £  d2,  we  have  2:2  =  0  and 


<  exp{-Wf(x)/e} 
exp{-lT|(x)/e} 


exp{— (5/e}. 
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For  X  =  (0,  X2)  G  9i,  we  consider  two  cases:  X2  <  and  X2  >  separately, 
where  X2  =  5 /a.  For  X2  <  X2-,  we  have 


P2{x)  < 


exp{-VF|(g;)/£} 

exp{-tF|(x)/e} 


exp 


2(7-  a) 

e 


X2  + 


<  exp{— (5/e}. 


Similarly,  for  x  >  X2,  we  have 


P2{x)  < 


exp{-lF|(3;)/£} 
exp{— hF/(a:)/e} 


f  2a  (5 } 
exp  < - X2  +  -  > 

I  e  e  J 


<  exp{— (5/e}. 


This  completes  the  proof. 
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